You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Айсина Роза <ra...@gpmdata.ru> on 2022/07/18 11:49:08 UTC

How to fix lost partitions gracefully?

Common labels: {"container":"ignite-node","instance":"ignite","job":"rec-matchtv/ignite","namespace":"rec-matchtv","node_name":"node2","app":"ignite"}
Line limit: 5000
Total bytes processed: "353  kB"


2022-07-16T23:39:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:57.626413876Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:57.626400848Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:57.626365211Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:39:57.626356366Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626347562Z"}
2022-07-16T23:39:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626338559Z"}
2022-07-16T23:39:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:39:57.626329415Z"}
2022-07-16T23:39:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626320104Z"}
2022-07-16T23:39:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:57.626310828Z"}
2022-07-16T23:39:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626301248Z"}
2022-07-16T23:39:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:57.62629116Z"}
2022-07-16T23:39:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626282027Z"}
2022-07-16T23:39:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:39:57.626272945Z"}
2022-07-16T23:39:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626263662Z"}
2022-07-16T23:39:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:57.626254268Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:39:57.626245367Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626236143Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Heap [used=546MB, free=81.6%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:39:57.626226767Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=8.33%, avgLoad=1.08%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:39:57.626217084Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:39:57.626207696Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:39:57.62619829Z"}
2022-07-16T23:39:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:47:00.451]\n","stream":"stderr","time":"2022-07-16T23:39:57.626189011Z"}
2022-07-16T23:39:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:39:57.626178471Z"}
2022-07-16T23:39:57Z	{"log":"[23:39:57,625][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:39:57.626139547Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:57.620118524Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:57.620093638Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:57.620086085Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:38:57.620078464Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620071414Z"}
2022-07-16T23:38:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620063916Z"}
2022-07-16T23:38:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:38:57.620056643Z"}
2022-07-16T23:38:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620049345Z"}
2022-07-16T23:38:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:57.62004176Z"}
2022-07-16T23:38:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620034046Z"}
2022-07-16T23:38:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:57.620026384Z"}
2022-07-16T23:38:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620018288Z"}
2022-07-16T23:38:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:38:57.620011034Z"}
2022-07-16T23:38:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.620003445Z"}
2022-07-16T23:38:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:57.619995967Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:38:57.619988999Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.619981789Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Heap [used=538MB, free=81.85%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:38:57.61997417Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1%, avgLoad=1.08%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:38:57.619966734Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:38:57.619958344Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:38:57.619950551Z"}
2022-07-16T23:38:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:46:00.448]\n","stream":"stderr","time":"2022-07-16T23:38:57.619942935Z"}
2022-07-16T23:38:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:38:57.619932599Z"}
2022-07-16T23:38:57Z	{"log":"[23:38:57,619][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:38:57.619874668Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:57.613097776Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- System thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:57.613089427Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Public thread pool [active=0, idle=2, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:57.613081292Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:37:57.613072669Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.613064815Z"}
2022-07-16T23:37:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.613056435Z"}
2022-07-16T23:37:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:37:57.613028471Z"}
2022-07-16T23:37:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.613019366Z"}
2022-07-16T23:37:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:57.613009967Z"}
2022-07-16T23:37:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.613000467Z"}
2022-07-16T23:37:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:57.612991204Z"}
2022-07-16T23:37:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.61298126Z"}
2022-07-16T23:37:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:37:57.612972273Z"}
2022-07-16T23:37:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.612963133Z"}
2022-07-16T23:37:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:57.612954034Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:37:57.612945358Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.61293633Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Heap [used=532MB, free=82.07%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:37:57.612927275Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=0.67%, avgLoad=1.08%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:37:57.61291801Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:37:57.612908207Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:37:57.612898827Z"}
2022-07-16T23:37:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:45:00.442]\n","stream":"stderr","time":"2022-07-16T23:37:57.612889288Z"}
2022-07-16T23:37:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:37:57.612877492Z"}
2022-07-16T23:37:57Z	{"log":"[23:37:57,612][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:37:57.612827804Z"}
2022-07-16T23:37:21Z	{"log":"[23:37:20,953][INFO][db-checkpoint-thread-#68][Checkpointer] Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=9ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=7ms, checkpointLockHoldTime=8ms, reason='timeout']\n","stream":"stderr","time":"2022-07-16T23:37:20.953707834Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,880][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=767, minorTopVer=0], evt=NODE_LEFT, evtNode=5f797546-6f89-460c-b102-85d4ec7d1189, client=false]\n","stream":"stderr","time":"2022-07-16T23:37:12.880854546Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,876][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], crd=false]\n","stream":"stderr","time":"2022-07-16T23:37:12.87648148Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,875][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:37:12.876004855Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,875][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], stage=\"Waiting in exchange queue\" (0 ms), stage=\"Exchange parameters initialization\" (0 ms), stage=\"Determine exchange type\" (0 ms), stage=\"Exchange done\" (3 ms), stage=\"Total time\" (3 ms)]\n","stream":"stderr","time":"2022-07-16T23:37:12.875860985Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,875][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=6daf16ec-368f-44d5-9751-fa7b440eb264, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, addrs=ArrayList [10.233.96.16, 127.0.0.1], sockAddrs=HashSet [/10.233.96.16:0, /127.0.0.1:0], discPort=0, order=762, intOrder=383, lastExchangeTime=1658008371506, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:37:12.875658082Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,872][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:37:12.872930465Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,870][INFO][exchange-worker-#62][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], crd=false, evt=NODE_LEFT, evtNode=5f797546-6f89-460c-b102-85d4ec7d1189, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:37:12.871147938Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,859][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:37:12.859720644Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,859][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=767, locNode=6daf16ec, servers=2, clients=1, state=ACTIVE, CPUs=17, offheap=2.4GB, heap=16.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:37:12.859482799Z"}
2022-07-16T23:37:12Z	{"log":"[23:37:12,857][INFO][disco-event-worker-#61][GridDiscoveryManager] Node left topology: TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, addrs=ArrayList [10.233.96.16, 127.0.0.1], sockAddrs=HashSet [/10.233.96.16:0, /127.0.0.1:0], discPort=0, order=762, intOrder=383, lastExchangeTime=1658008371506, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:37:12.858159726Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:57.60197926Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:57.601975627Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:57.601971558Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:36:57.601967838Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601963757Z"}
2022-07-16T23:36:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601960024Z"}
2022-07-16T23:36:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:36:57.60195555Z"}
2022-07-16T23:36:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601951601Z"}
2022-07-16T23:36:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:57.601947351Z"}
2022-07-16T23:36:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601929801Z"}
2022-07-16T23:36:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:57.601925449Z"}
2022-07-16T23:36:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601920855Z"}
2022-07-16T23:36:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:36:57.601916504Z"}
2022-07-16T23:36:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601911939Z"}
2022-07-16T23:36:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:57.601907657Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:36:57.601903333Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601898667Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Heap [used=523MB, free=82.38%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:36:57.601891964Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1%, avgLoad=1.08%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:36:57.601887219Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:36:57.601882525Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=766, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:36:57.601878147Z"}
2022-07-16T23:36:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:44:00.435]\n","stream":"stderr","time":"2022-07-16T23:36:57.601873583Z"}
2022-07-16T23:36:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:36:57.601867977Z"}
2022-07-16T23:36:57Z	{"log":"[23:36:57,601][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:36:57.601827933Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:57.592162745Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- System thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:57.592154734Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Public thread pool [active=0, idle=2, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:57.592146822Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:35:57.592139106Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592131289Z"}
2022-07-16T23:35:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592123209Z"}
2022-07-16T23:35:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:35:57.592114927Z"}
2022-07-16T23:35:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592106642Z"}
2022-07-16T23:35:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:57.592098292Z"}
2022-07-16T23:35:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592089943Z"}
2022-07-16T23:35:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:57.592081801Z"}
2022-07-16T23:35:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592073501Z"}
2022-07-16T23:35:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:35:57.592065049Z"}
2022-07-16T23:35:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592039434Z"}
2022-07-16T23:35:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:57.592030191Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:35:57.592021235Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592011971Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Heap [used=517MB, free=82.58%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:35:57.592002934Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=0.67%, avgLoad=1.07%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:35:57.591994152Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:35:57.591984945Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=766, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:35:57.591975649Z"}
2022-07-16T23:35:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:43:00.423]\n","stream":"stderr","time":"2022-07-16T23:35:57.591966377Z"}
2022-07-16T23:35:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:35:57.591951598Z"}
2022-07-16T23:35:57Z	{"log":"[23:35:57,591][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:35:57.591910104Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,332][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=766, minorTopVer=0], evt=NODE_LEFT, evtNode=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, client=false]\n","stream":"stderr","time":"2022-07-16T23:35:35.332350253Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,329][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], crd=false]\n","stream":"stderr","time":"2022-07-16T23:35:35.329202747Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,328][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:35.328977581Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,328][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], stage=\"Waiting in exchange queue\" (0 ms), stage=\"Exchange parameters initialization\" (0 ms), stage=\"Determine exchange type\" (0 ms), stage=\"Exchange done\" (1 ms), stage=\"Total time\" (1 ms)]\n","stream":"stderr","time":"2022-07-16T23:35:35.328935767Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,328][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=6daf16ec-368f-44d5-9751-fa7b440eb264, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, addrs=ArrayList [10.233.96.94, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.94:0], discPort=0, order=759, intOrder=382, lastExchangeTime=1658008371506, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:35.328871041Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,328][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:35:35.32813686Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,327][INFO][exchange-worker-#62][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], crd=false, evt=NODE_LEFT, evtNode=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:35:35.327214675Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,315][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:35:35.315227624Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,314][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=766, locNode=6daf16ec, servers=2, clients=2, state=ACTIVE, CPUs=97, offheap=2.4GB, heap=19.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:35:35.315058366Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,314][INFO][disco-event-worker-#61][GridDiscoveryManager] Node left topology: TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, addrs=ArrayList [10.233.96.94, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.94:0], discPort=0, order=759, intOrder=382, lastExchangeTime=1658008371506, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:35:35.314480333Z"}
2022-07-16T23:35:30Z	{"log":"[23:35:30,144][INFO][grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.233.96.234:47100, rmtAddr=/10.233.96.87:14056]\n","stream":"stderr","time":"2022-07-16T23:35:30.144435615Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,678][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=765, minorTopVer=0], evt=NODE_JOINED, evtNode=260cbabb-4302-49ac-ad53-fe32ca021026, client=false]\n","stream":"stderr","time":"2022-07-16T23:35:23.678907085Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,673][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], crd=false]\n","stream":"stderr","time":"2022-07-16T23:35:23.673653299Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,673][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:23.6732462Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,673][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], stage=\"Waiting in exchange queue\" (0 ms), stage=\"Exchange parameters initialization\" (0 ms), stage=\"Determine exchange type\" (22 ms), stage=\"Exchange done\" (3 ms), stage=\"Total time\" (25 ms)]\n","stream":"stderr","time":"2022-07-16T23:35:23.67312508Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,672][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=6daf16ec-368f-44d5-9751-fa7b440eb264, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, addrs=ArrayList [10.233.96.87, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.87:0], discPort=0, order=765, intOrder=385, lastExchangeTime=1658014523612, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:23.672998134Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,670][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:35:23.671034628Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,646][INFO][exchange-worker-#62][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], crd=false, evt=NODE_JOINED, evtNode=260cbabb-4302-49ac-ad53-fe32ca021026, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:35:23.646929673Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,644][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:35:23.64474319Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,644][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=765, locNode=6daf16ec, servers=2, clients=3, state=ACTIVE, CPUs=112, offheap=2.4GB, heap=29.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:35:23.644600635Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,643][INFO][disco-event-worker-#61][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, addrs=ArrayList [10.233.96.87, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.87:0], discPort=0, order=765, intOrder=385, lastExchangeTime=1658014523612, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:35:23.643897887Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,642][WARNING][disco-event-worker-#61][GridDiscoveryManager] Local node's value of 'java.net.preferIPv4Stack' system property differs from remote node's (all nodes in topology should have identical value) [locPreferIpV4=true, rmtPreferIpV4=null, locId8=6daf16ec, rmtId8=260cbabb, rmtAddrs=[matchtv-79f66d7b74-592dk/10.233.96.87, /127.0.0.1], rmtNode=ClusterNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, order=765, addr=[10.233.96.87, 127.0.0.1], daemon=false]]\n","stream":"stderr","time":"2022-07-16T23:35:23.643458713Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:57.584587331Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:57.584581402Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:57.584575249Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:34:57.584569382Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Ignite persistence [used=3292MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.5845636Z"}
2022-07-16T23:34:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584557795Z"}
2022-07-16T23:34:57Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:34:57.584552003Z"}
2022-07-16T23:34:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584545951Z"}
2022-07-16T23:34:57Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:57.584539649Z"}
2022-07-16T23:34:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584513779Z"}
2022-07-16T23:34:57Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:57.584506867Z"}
2022-07-16T23:34:57Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1180MB, freeRam=3.92%, allocRam=1228MB, allocTotal=3292MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584500051Z"}
2022-07-16T23:34:57Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:34:57.584493395Z"}
2022-07-16T23:34:57Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584486743Z"}
2022-07-16T23:34:57Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:57.584480126Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Page memory [pages=298771]\n","stream":"stderr","time":"2022-07-16T23:34:57.584473732Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Off-heap memory [used=1180MB, free=27.51%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584467Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Heap [used=504MB, free=83%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:34:57.584460445Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1%, avgLoad=1.06%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:34:57.584453556Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Network [addrs=[10.233.96.234, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:34:57.584446281Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=764, minorTopVer=1]\n","stream":"stderr","time":"2022-07-16T23:34:57.584439479Z"}
2022-07-16T23:34:57Z	{"log":"    ^-- Node [id=6daf16ec, uptime=01:42:00.415]\n","stream":"stderr","time":"2022-07-16T23:34:57.584432735Z"}
2022-07-16T23:34:57Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:34:57.58442351Z"}
2022-07-16T23:34:57Z	{"log":"[23:34:57,584][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:34:57.584373917Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:37.235526749Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:37.235519731Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:39:37.235512838Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:39:37.235505858Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235498956Z"}
2022-07-16T23:39:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235491603Z"}
2022-07-16T23:39:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:39:37.235484339Z"}
2022-07-16T23:39:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235476693Z"}
2022-07-16T23:39:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:37.235468473Z"}
2022-07-16T23:39:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235441394Z"}
2022-07-16T23:39:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:37.235433373Z"}
2022-07-16T23:39:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235424926Z"}
2022-07-16T23:39:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:39:37.235416882Z"}
2022-07-16T23:39:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.23540862Z"}
2022-07-16T23:39:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:39:37.235400429Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:39:37.235392414Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.23538361Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Heap [used=1133MB, free=61.82%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:39:37.235375664Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1%, avgLoad=10.36%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:39:37.235367768Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:39:37.235359462Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:39:37.235351236Z"}
2022-07-16T23:39:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:09:11.960]\n","stream":"stderr","time":"2022-07-16T23:39:37.235342763Z"}
2022-07-16T23:39:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:39:37.235332263Z"}
2022-07-16T23:39:37Z	{"log":"[23:39:37,235][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:39:37.235277106Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:37.228151338Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:37.228149004Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:38:37.228146507Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:38:37.228144207Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.22814181Z"}
2022-07-16T23:38:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.228139488Z"}
2022-07-16T23:38:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:38:37.228137089Z"}
2022-07-16T23:38:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.22813473Z"}
2022-07-16T23:38:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:37.228132465Z"}
2022-07-16T23:38:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.228130084Z"}
2022-07-16T23:38:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:37.228127549Z"}
2022-07-16T23:38:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.22812511Z"}
2022-07-16T23:38:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:38:37.22812268Z"}
2022-07-16T23:38:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.228120038Z"}
2022-07-16T23:38:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:38:37.22811061Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:38:37.22810807Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.228105433Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Heap [used=1116MB, free=62.41%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:38:37.228102889Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1%, avgLoad=10.37%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:38:37.228100209Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:38:37.228097488Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:38:37.228094343Z"}
2022-07-16T23:38:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:08:11.957]\n","stream":"stderr","time":"2022-07-16T23:38:37.22809114Z"}
2022-07-16T23:38:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:38:37.228087826Z"}
2022-07-16T23:38:37Z	{"log":"[23:38:37,227][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:38:37.228067859Z"}
2022-07-16T23:38:17Z	{"log":"[23:38:17,892][INFO][db-checkpoint-thread-#73][Checkpointer] Checkpoint finished [cpId=bed59300-0a6e-4627-8772-801834a06e7c, pages=3932, markPos=WALPointer [idx=28382, fileOff=36128292, len=107140], walSegmentsCovered=[], markDuration=36ms, pagesWrite=176ms, fsync=126ms, total=358ms]\n","stream":"stderr","time":"2022-07-16T23:38:17.892623722Z"}
2022-07-16T23:38:17Z	{"log":"[23:38:17,590][INFO][db-checkpoint-thread-#73][Checkpointer] Checkpoint started [checkpointId=bed59300-0a6e-4627-8772-801834a06e7c, startPtr=WALPointer [idx=28382, fileOff=36128292, len=107140], checkpointBeforeLockTime=20ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=8ms, checkpointLockHoldTime=10ms, walCpRecordFsyncDuration=21ms, writeCheckpointEntryDuration=2ms, splitAndSortCpPagesDuration=3ms, pages=3932, reason='timeout']\n","stream":"stderr","time":"2022-07-16T23:38:17.590678073Z"}
2022-07-16T23:37:42Z	{"log":"\u0009... 20 more\n","stream":"stderr","time":"2022-07-16T23:37:42.508776614Z"}
2022-07-16T23:37:42Z	{"log":"Caused by: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]\n","stream":"stderr","time":"2022-07-16T23:37:42.508771466Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at java.base/java.lang.Thread.run(Thread.java:829)\n","stream":"stderr","time":"2022-07-16T23:37:42.508766495Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n","stream":"stderr","time":"2022-07-16T23:37:42.508761438Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n","stream":"stderr","time":"2022-07-16T23:37:42.50875644Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)\n","stream":"stderr","time":"2022-07-16T23:37:42.508751443Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)\n","stream":"stderr","time":"2022-07-16T23:37:42.5087466Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)\n","stream":"stderr","time":"2022-07-16T23:37:42.508741433Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)\n","stream":"stderr","time":"2022-07-16T23:37:42.508736576Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)\n","stream":"stderr","time":"2022-07-16T23:37:42.508730415Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:55)\n","stream":"stderr","time":"2022-07-16T23:37:42.508725381Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:204)\n","stream":"stderr","time":"2022-07-16T23:37:42.508704068Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:99)\n","stream":"stderr","time":"2022-07-16T23:37:42.508699057Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.platform.client.cache.ClientCacheScanQueryRequest.process(ClientCacheScanQueryRequest.java:88)\n","stream":"stderr","time":"2022-07-16T23:37:42.508693849Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:418)\n","stream":"stderr","time":"2022-07-16T23:37:42.50868874Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:843)\n","stream":"stderr","time":"2022-07-16T23:37:42.508684107Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:516)\n","stream":"stderr","time":"2022-07-16T23:37:42.508679292Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:3482)\n","stream":"stderr","time":"2022-07-16T23:37:42.508674598Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)\n","stream":"stderr","time":"2022-07-16T23:37:42.508669878Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:517)\n","stream":"stderr","time":"2022-07-16T23:37:42.508664888Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:519)\n","stream":"stderr","time":"2022-07-16T23:37:42.508659678Z"}
2022-07-16T23:37:42Z	{"log":"\u0009at org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:597)\n","stream":"stderr","time":"2022-07-16T23:37:42.508653868Z"}
2022-07-16T23:37:42Z	{"log":"javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]\n","stream":"stderr","time":"2022-07-16T23:37:42.508647013Z"}
2022-07-16T23:37:42Z	{"log":"[23:37:42,508][SEVERE][client-connector-#127][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@6c611807, msg=class o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]]\n","stream":"stderr","time":"2022-07-16T23:37:42.508608563Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:37.234280891Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- System thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:37.234273189Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Public thread pool [active=0, idle=2, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:37:37.234265685Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:37:37.234257945Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234250145Z"}
2022-07-16T23:37:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234242132Z"}
2022-07-16T23:37:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:37:37.234233246Z"}
2022-07-16T23:37:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234224791Z"}
2022-07-16T23:37:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:37.234217098Z"}
2022-07-16T23:37:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234208727Z"}
2022-07-16T23:37:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:37.234182545Z"}
2022-07-16T23:37:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234172199Z"}
2022-07-16T23:37:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:37:37.234163213Z"}
2022-07-16T23:37:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234153533Z"}
2022-07-16T23:37:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:37:37.234143474Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:37:37.234134543Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234125148Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Heap [used=1091MB, free=63.26%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:37:37.234115889Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1.33%, avgLoad=10.39%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:37:37.234106836Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:37:37.234098078Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Cluster [hosts=3, CPUs=17, servers=2, clients=1, topVer=767, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:37:37.234089183Z"}
2022-07-16T23:37:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:07:11.957]\n","stream":"stderr","time":"2022-07-16T23:37:37.23407893Z"}
2022-07-16T23:37:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:37:37.234066776Z"}
2022-07-16T23:37:37Z	{"log":"[23:37:37,233][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:37:37.234008715Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,862][INFO][exchange-worker-#54][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=767, minorTopVer=0], evt=NODE_LEFT, evtNode=5f797546-6f89-460c-b102-85d4ec7d1189, client=false]\n","stream":"stderr","time":"2022-07-16T23:37:12.862224651Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,861][INFO][exchange-worker-#54][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], crd=true]\n","stream":"stderr","time":"2022-07-16T23:37:12.861306402Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,860][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:37:12.860864828Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,860][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], stage=\"Waiting in exchange queue\" (0 ms), stage=\"Exchange parameters initialization\" (0 ms), stage=\"Determine exchange type\" (0 ms), stage=\"Exchange done\" (2 ms), stage=\"Total time\" (2 ms)]\n","stream":"stderr","time":"2022-07-16T23:37:12.860847469Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,860][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, addrs=ArrayList [10.233.96.16, 127.0.0.1], sockAddrs=HashSet [/10.233.96.16:0, /127.0.0.1:0], discPort=0, order=762, intOrder=383, lastExchangeTime=1658007490342, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:37:12.860753749Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,858][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:37:12.858892291Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,857][INFO][exchange-worker-#54][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=767, minorTopVer=0], crd=true, evt=NODE_LEFT, evtNode=5f797546-6f89-460c-b102-85d4ec7d1189, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:37:12.857664217Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,855][INFO][disco-event-worker-#52][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:37:12.855756752Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,855][INFO][tcp-disco-sock-reader-[5f797546 10.233.96.16:38557 client]-#400-#195540][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/10.233.96.16:38557, rmtPort=38557, rmtNodeId=5f797546-6f89-460c-b102-85d4ec7d1189]\n","stream":"stderr","time":"2022-07-16T23:37:12.855744901Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,855][INFO][disco-event-worker-#52][GridDiscoveryManager] Topology snapshot [ver=767, locNode=b54b6cde, servers=2, clients=1, state=ACTIVE, CPUs=17, offheap=2.4GB, heap=16.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:37:12.85570695Z"}
2022-07-16T23:37:13Z	{"log":"[23:37:12,855][INFO][disco-event-worker-#52][GridDiscoveryManager] Node left topology: TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, addrs=ArrayList [10.233.96.16, 127.0.0.1], sockAddrs=HashSet [/10.233.96.16:0, /127.0.0.1:0], discPort=0, order=762, intOrder=383, lastExchangeTime=1658007490342, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:37:12.855424773Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:37.230533509Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:37.230525018Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:36:37.230516722Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:36:37.23050821Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230500437Z"}
2022-07-16T23:36:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230492355Z"}
2022-07-16T23:36:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:36:37.230484447Z"}
2022-07-16T23:36:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230476292Z"}
2022-07-16T23:36:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:37.230468148Z"}
2022-07-16T23:36:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230459728Z"}
2022-07-16T23:36:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:37.230451208Z"}
2022-07-16T23:36:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230441677Z"}
2022-07-16T23:36:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:36:37.23041221Z"}
2022-07-16T23:36:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230399883Z"}
2022-07-16T23:36:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:36:37.230365252Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:36:37.230356412Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230347289Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Heap [used=1075MB, free=63.8%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:36:37.230338133Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1.33%, avgLoad=10.42%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:36:37.230328311Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:36:37.230318917Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=766, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:36:37.230309374Z"}
2022-07-16T23:36:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:06:11.955]\n","stream":"stderr","time":"2022-07-16T23:36:37.230297831Z"}
2022-07-16T23:36:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:36:37.230285895Z"}
2022-07-16T23:36:37Z	{"log":"[23:36:37,229][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:36:37.23022931Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:37.220421139Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- System thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:37.220418825Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Public thread pool [active=0, idle=2, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:35:37.220416566Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:35:37.220414395Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220412058Z"}
2022-07-16T23:35:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220409583Z"}
2022-07-16T23:35:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:35:37.220407182Z"}
2022-07-16T23:35:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220404667Z"}
2022-07-16T23:35:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:37.220402411Z"}
2022-07-16T23:35:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220399961Z"}
2022-07-16T23:35:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:37.22039767Z"}
2022-07-16T23:35:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220395344Z"}
2022-07-16T23:35:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:35:37.22039304Z"}
2022-07-16T23:35:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220390602Z"}
2022-07-16T23:35:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:35:37.220388226Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:35:37.220385817Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.220383409Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Heap [used=1060MB, free=64.28%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:35:37.22036772Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=3.67%, avgLoad=10.44%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:35:37.220365482Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:35:37.220363045Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=766, minorTopVer=0]\n","stream":"stderr","time":"2022-07-16T23:35:37.22036054Z"}
2022-07-16T23:35:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:05:11.951]\n","stream":"stderr","time":"2022-07-16T23:35:37.220357994Z"}
2022-07-16T23:35:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:35:37.220354902Z"}
2022-07-16T23:35:37Z	{"log":"[23:35:37,220][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:35:37.220320334Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,317][INFO][tcp-disco-sock-reader-[8d5b3dad 10.233.96.94:35953 client]-#402-#195655][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/10.233.96.94:35953, rmtPort=35953, rmtNodeId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7]\n","stream":"stderr","time":"2022-07-16T23:35:35.317311384Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,315][INFO][exchange-worker-#54][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=766, minorTopVer=0], evt=NODE_LEFT, evtNode=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, client=false]\n","stream":"stderr","time":"2022-07-16T23:35:35.315779586Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,315][INFO][exchange-worker-#54][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], crd=true]\n","stream":"stderr","time":"2022-07-16T23:35:35.315170838Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,314][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:35.314937762Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,314][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], stage=\"Waiting in exchange queue\" (0 ms), stage=\"Exchange parameters initialization\" (0 ms), stage=\"Determine exchange type\" (0 ms), stage=\"Exchange done\" (1 ms), stage=\"Total time\" (1 ms)]\n","stream":"stderr","time":"2022-07-16T23:35:35.314912229Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,314][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, addrs=ArrayList [10.233.96.94, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.94:0], discPort=0, order=759, intOrder=382, lastExchangeTime=1658007338647, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:35.3148781Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,313][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:35:35.313754453Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,313][INFO][exchange-worker-#54][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=766, minorTopVer=0], crd=true, evt=NODE_LEFT, evtNode=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:35:35.313310557Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at java.base/java.lang.Thread.run(Thread.java:829)\n","stream":"stderr","time":"2022-07-16T23:35:35.31228367Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)\n","stream":"stderr","time":"2022-07-16T23:35:35.312254951Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910)\n","stream":"stderr","time":"2022-07-16T23:35:35.312246179Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2157)\n","stream":"stderr","time":"2022-07-16T23:35:35.312237238Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2794)\n","stream":"stderr","time":"2022-07-16T23:35:35.312229178Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2835)\n","stream":"stderr","time":"2022-07-16T23:35:35.312221057Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.closeKey(GridNioServer.java:2784)\n","stream":"stderr","time":"2022-07-16T23:35:35.312212746Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at org.apache.ignite.internal.util.IgniteUtils.close(IgniteUtils.java:4248)\n","stream":"stderr","time":"2022-07-16T23:35:35.312204187Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at java.base/sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:406)\n","stream":"stderr","time":"2022-07-16T23:35:35.312195787Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at java.base/sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:930)\n","stream":"stderr","time":"2022-07-16T23:35:35.312186929Z"}
2022-07-16T23:35:35Z	{"log":"\u0009at java.base/sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:160)\n","stream":"stderr","time":"2022-07-16T23:35:35.312177654Z"}
2022-07-16T23:35:35Z	{"log":"java.nio.channels.ClosedChannelException\n","stream":"stderr","time":"2022-07-16T23:35:35.312168703Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,311][WARNING][grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%][TcpCommunicationSpi] Failed to shutdown socket\n","stream":"stderr","time":"2022-07-16T23:35:35.312149651Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,311][INFO][disco-event-worker-#52][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:35:35.311279556Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,311][INFO][disco-event-worker-#52][GridDiscoveryManager] Topology snapshot [ver=766, locNode=b54b6cde, servers=2, clients=2, state=ACTIVE, CPUs=97, offheap=2.4GB, heap=19.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:35:35.311243307Z"}
2022-07-16T23:35:35Z	{"log":"[23:35:35,310][INFO][disco-event-worker-#52][GridDiscoveryManager] Node left topology: TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, addrs=ArrayList [10.233.96.94, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.94:0], discPort=0, order=759, intOrder=382, lastExchangeTime=1658007338647, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:35:35.311002915Z"}
2022-07-16T23:35:30Z	{"log":"[23:35:29,936][INFO][sys-#196456][GridCachePartitionExchangeManager] Fast replied to single message [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], discoEvt=null, nodeId=260cbabb, evt=NODE_JOINED], nodeId=260cbabb-4302-49ac-ad53-fe32ca021026]\n","stream":"stderr","time":"2022-07-16T23:35:29.936446634Z"}
2022-07-16T23:35:29Z	{"log":"[23:35:29,755][INFO][grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.233.96.120:47100, rmtAddr=/10.233.96.87:35836]\n","stream":"stderr","time":"2022-07-16T23:35:29.755516795Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,675][INFO][exchange-worker-#54][GridCachePartitionExchangeManager] AffinityTopologyVersion [topVer=765, minorTopVer=0], evt=NODE_JOINED, evtNode=260cbabb-4302-49ac-ad53-fe32ca021026, client=false]\n","stream":"stderr","time":"2022-07-16T23:35:23.676101402Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,673][INFO][exchange-worker-#54][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], crd=true]\n","stream":"stderr","time":"2022-07-16T23:35:23.673609447Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,673][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:23.673147256Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,672][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], stage=\"Waiting in exchange queue\" (1 ms), stage=\"Exchange parameters initialization\" (28 ms), stage=\"Determine exchange type\" (1 ms), stage=\"Exchange done\" (4 ms), stage=\"Total time\" (34 ms)]\n","stream":"stderr","time":"2022-07-16T23:35:23.673103341Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,672][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, addrs=ArrayList [10.233.96.87, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.87:0], discPort=0, order=765, intOrder=385, lastExchangeTime=1658014523534, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true], rebalanced=true, done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-16T23:35:23.672962195Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,669][INFO][exchange-worker-#54][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=true]\n","stream":"stderr","time":"2022-07-16T23:35:23.669332487Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,666][INFO][exchange-worker-#54][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=765, minorTopVer=0], crd=true, evt=NODE_JOINED, evtNode=260cbabb-4302-49ac-ad53-fe32ca021026, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]\n","stream":"stderr","time":"2022-07-16T23:35:23.666941459Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,636][INFO][disco-event-worker-#52][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]\n","stream":"stderr","time":"2022-07-16T23:35:23.636329237Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,636][INFO][disco-event-worker-#52][GridDiscoveryManager] Topology snapshot [ver=765, locNode=b54b6cde, servers=2, clients=3, state=ACTIVE, CPUs=112, offheap=2.4GB, heap=29.0GB, aliveNodes=[TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, consistentId=8d5b3dad-7f4c-4a79-8d98-c5f6b05a0fc7, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=5f797546-6f89-460c-b102-85d4ec7d1189, consistentId=5f797546-6f89-460c-b102-85d4ec7d1189, isClient=true, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, isClient=false, ver=2.12.0#20220108-sha1:b1289f75], TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, isClient=true, ver=2.12.0#20220108-sha1:b1289f75]]]\n","stream":"stderr","time":"2022-07-16T23:35:23.63628893Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,635][INFO][disco-event-worker-#52][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, consistentId=260cbabb-4302-49ac-ad53-fe32ca021026, addrs=ArrayList [10.233.96.87, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:0, /10.233.96.87:0], discPort=0, order=765, intOrder=385, lastExchangeTime=1658014523534, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]\n","stream":"stderr","time":"2022-07-16T23:35:23.635776719Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,635][WARNING][disco-event-worker-#52][GridDiscoveryManager] Local node's value of 'java.net.preferIPv4Stack' system property differs from remote node's (all nodes in topology should have identical value) [locPreferIpV4=true, rmtPreferIpV4=null, locId8=b54b6cde, rmtId8=260cbabb, rmtAddrs=[matchtv-79f66d7b74-592dk/10.233.96.87, /127.0.0.1], rmtNode=ClusterNode [id=260cbabb-4302-49ac-ad53-fe32ca021026, order=765, addr=[10.233.96.87, 127.0.0.1], daemon=false]]\n","stream":"stderr","time":"2022-07-16T23:35:23.635635404Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,505][INFO][tcp-disco-sock-reader-[260cbabb 10.233.96.87:63377 client]-#406-#196458][TcpDiscoverySpi] Initialized connection with remote client node [nodeId=260cbabb-4302-49ac-ad53-fe32ca021026, rmtAddr=/10.233.96.87:63377]\n","stream":"stderr","time":"2022-07-16T23:35:23.505549029Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,500][INFO][tcp-disco-sock-reader-[]-#406-#196458][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/10.233.96.87:63377, rmtPort=63377]\n","stream":"stderr","time":"2022-07-16T23:35:23.500308518Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,499][INFO][tcp-disco-srvr-[:47500]-#3-#48][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/10.233.96.87, rmtPort=63377]\n","stream":"stderr","time":"2022-07-16T23:35:23.499715134Z"}
2022-07-16T23:35:23Z	{"log":"[23:35:23,499][INFO][tcp-disco-srvr-[:47500]-#3-#48][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/10.233.96.87, rmtPort=63377]\n","stream":"stderr","time":"2022-07-16T23:35:23.499496959Z"}
2022-07-16T23:35:10Z	{"log":"[23:35:10,403][INFO][db-checkpoint-thread-#73][Checkpointer] Checkpoint finished [cpId=231ae9d9-54a2-481f-a9e1-f75cb84498fe, pages=4277, markPos=WALPointer [idx=28382, fileOff=19093091, len=107140], walSegmentsCovered=[], markDuration=34ms, pagesWrite=252ms, fsync=127ms, total=528ms]\n","stream":"stderr","time":"2022-07-16T23:35:10.404052963Z"}
2022-07-16T23:35:10Z	{"log":"[23:35:10,024][INFO][db-checkpoint-thread-#73][Checkpointer] Checkpoint started [checkpointId=231ae9d9-54a2-481f-a9e1-f75cb84498fe, startPtr=WALPointer [idx=28382, fileOff=19093091, len=107140], checkpointBeforeLockTime=115ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=8ms, checkpointLockHoldTime=9ms, walCpRecordFsyncDuration=15ms, writeCheckpointEntryDuration=7ms, splitAndSortCpPagesDuration=2ms, pages=4277, reason='timeout']\n","stream":"stderr","time":"2022-07-16T23:35:10.024274266Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Striped thread pool [active=0, idle=8, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:37.211690807Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- System thread pool [active=0, idle=7, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:37.211682936Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Public thread pool [active=0, idle=0, qSize=0]\n","stream":"stderr","time":"2022-07-16T23:34:37.211674936Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Outbound messages queue [size=0]\n","stream":"stderr","time":"2022-07-16T23:34:37.211667206Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Ignite persistence [used=3175MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211658979Z"}
2022-07-16T23:34:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211650509Z"}
2022-07-16T23:34:37Z	{"log":"    ^--   volatileDsMemPlc region [type=user, persistence=false, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:34:37.211642001Z"}
2022-07-16T23:34:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211632665Z"}
2022-07-16T23:34:37Z	{"log":"    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:37.211605127Z"}
2022-07-16T23:34:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.92%, allocRam=0MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211595695Z"}
2022-07-16T23:34:37Z	{"log":"    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:37.211586515Z"}
2022-07-16T23:34:37Z	{"log":"      ...  initCfg=256MB, maxCfg=1228MB, usedRam=1205MB, freeRam=1.93%, allocRam=1228MB, allocTotal=3175MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.21157708Z"}
2022-07-16T23:34:37Z	{"log":"    ^--   default region [type=default, persistence=true, lazyAlloc=true,\n","stream":"stderr","time":"2022-07-16T23:34:37.211567844Z"}
2022-07-16T23:34:37Z	{"log":"      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.98%, allocRam=100MB, allocTotal=0MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211558432Z"}
2022-07-16T23:34:37Z	{"log":"    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,\n","stream":"stderr","time":"2022-07-16T23:34:37.211548913Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Page memory [pages=304942]\n","stream":"stderr","time":"2022-07-16T23:34:37.211539217Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Off-heap memory [used=1205MB, free=26.01%, allocated=1428MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211530006Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Heap [used=1779MB, free=40.09%, comm=2969MB]\n","stream":"stderr","time":"2022-07-16T23:34:37.211520818Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- CPU [CPUs=1, curLoad=1.33%, avgLoad=10.46%, GC=0%]\n","stream":"stderr","time":"2022-07-16T23:34:37.211511806Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Network [addrs=[10.233.96.120, 127.0.0.1], discoPort=47500, commPort=47100]\n","stream":"stderr","time":"2022-07-16T23:34:37.211502467Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Cluster [hosts=4, CPUs=97, servers=2, clients=2, topVer=764, minorTopVer=1]\n","stream":"stderr","time":"2022-07-16T23:34:37.21149256Z"}
2022-07-16T23:34:37Z	{"log":"    ^-- Node [id=b54b6cde, uptime=17 days, 04:04:11.940]\n","stream":"stderr","time":"2022-07-16T23:34:37.211483048Z"}
2022-07-16T23:34:37Z	{"log":"Metrics for local node (to disable set 'metricsLogFrequency' to 0)\n","stream":"stderr","time":"2022-07-16T23:34:37.211471168Z"}
2022-07-16T23:34:37Z	{"log":"[23:34:37,211][INFO][grid-timeout-worker-#22][IgniteKernal] \n","stream":"stderr","time":"2022-07-16T23:34:37.211414118Z"}

Re: How to fix lost partitions gracefully?

Posted by Айсина Роза <ra...@gpmdata.ru>.

Hi Semyon!  

  

Thank you for the tip - I will try to investigate OOM problem.  

One more question about Ignite here - is it right that if some caches become
too large for the heap they are dumped to persistence (as it is enabled)?  

Or in this case cluster will be crashed under OOM?  

We have caches that constantly grow over time without any key policy (yet) -
can they be the source of the problem?  

  

Considering loss partitions recovery - I just entered one of the nodes and
executed this line:  

    
    
    control.sh --cache reset_lost_partitions PipelineConfig  
    

The exit status was 0 and no errors, but cache was still unavailable :(  

I am not sure was the new cluster active after its two server nodes restarted.  

What is correct way to do it considering enabled persistence?

  

Best regards,  

Rose.

* * *

**From:** Данилов Семён <sa...@yandex.ru>  
**Sent:** Thursday, July 21, 2022 8:22:41 PM  
**To:** user@ignite.apache.org  
**Subject:** Re: How to fix lost partitions gracefully?



Hello, Rose!  
  
It sure is hard to figure out why pod restarts, but "Killed" in the logs makes
me think that it might be due to out of memory killer.  
You can try searching for oom killer in journalctl (or which one log utility
you have on your particular system).  
  
As for lost partitions, could you please elaborate on how you used control.sh
to reset lost partitions? I tried reproducing your scenario (but without OOM)
and I got it to work with reset lost partitions command.  
Sadly at this moment, Ignite doesn't reset lost partitions automatically, but
we do have a ticket addressing this issue
<https://issues.apache.org/jira/browse/IGNITE-15653>.  
  
Kind regards,  
Semyon.  
  
  
> Hello Семён!  
>  
> Thank you for your reply!  
>  
> 2\. Yeah, at some point all nodes go down and, as they are created by
> StatefulSet, k8s recreates each with the same pod name (ignite-0, ignite-1).
> I can be sure because number of restarts changes from 0 to 1. I just fully
> repeated deployment instructions from  
> official docs (how to deploy Ignite on k8s).  
> 3\. I attached the cluster configuration to this email. Persistence is
> enabled and files are stored on disk of k8s node (by nodeSelectors I make
> sure that all pods look into the same folder on disk).  
> 4\. I am not sure which ID you mean.  
>  
> What I see in Grafana logs (sorry, a lot of text below):  
>  
> ignite-1: ID changes from b54b6cde to  
> 7776249c when node restarts (note: no any errors before restart):  
>  
> // old ID  
>  
> {"log":" ^--  
> Node [id=b54b6cde, uptime=17 days,
> 09:20:13.458]\n","stream":"stderr","time":"2022-07-17T04:50:38.730606703Z"}  
>  
> // here it restarts with the same consistentId and new node ID  
>  
> {"log":"[04:51:01,500][INFO][main][PdsFoldersResolver]  
> Consistent ID used for local node is [932603da-955c-4b1a-8717-2ce2e875b20c]  
> according to persistence data storage
> folders\n","stream":"stderr","time":"2022-07-17T04:51:01.500301744Z"}  
>  
> // sees the second node  
>  
> {"log":"[04:51:44,707][INFO][tcp-disco-msg-
> worker-[]-#2-#55][TcpDiscoverySpi]  
> New next node [newNext=TcpDiscoveryNode
> [id=6daf16ec-368f-44d5-9751-fa7b440eb264,
> consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, addrs=ArrayList
> [10.233.96.234, 127.0.0.1], sockAddrs=HashSet [ignite-0.ignite.rec-
> matchtv.svc.cluster.local/10.233.96.234:47500,  
> /127.0.0.1:47500], discPort=47500, order=764, intOrder=384,
> lastExchangeTime=1658033499670, loc=false,
> ver=2.12.0#20220108-sha1:b1289f75,
> isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:44.708116152Z"}  
>  
> {"log":"\u003e\u003e\u003e  
> Local node [ID=7776249C-FB90-4761-B3BD-79244C97EAB7, order=773,
> clientMode=false]\n","stream":"stderr","time":"2022-07-17T04:51:49.08998977Z"}  
>  
> ignite-0: sees that old ignite-1 node failed, lost partitions were detected
> and new node was connected with the same  
> consistentId:  
>  
> {"log":"[04:50:54,905][WARNING][disco-event-
> worker-#61][GridDiscoveryManager]  
> Node FAILED: TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f,
> consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, addrs=ArrayList
> [10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,  
> ignite-1.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500],
> discPort=47500, order=2, intOrder=2, lastExchangeTime=1658008371506,
> loc=false, ver=2.12.0#20220108-sha1:b1289f75,
> isClient=false]\n","stream":"stderr","time":"2022-07-17T04:50:54.905678158Z"}  
>  
> {"log":"[04:50:55,172][WARNING][sys-#3337][GridDhtPartitionTopologyImpl]  
> Detected lost partitions [grp=PipelineConfig, parts=[2, 12-21, 23, ...],
> topVer=AffinityTopologyVersion [topVer=772,
> minorTopVer=0]]\n","stream":"stderr","time":"2022-07-17T04:50:55.172591305Z"}  
>  
> {"log":"[04:51:39,521][INFO][tcp-disco-msg-
> worker-[crd]-#2-#55][TcpDiscoverySpi]  
> New next node [newNext=TcpDiscoveryNode
> [id=7776249c-fb90-4761-b3bd-79244c97eab7,
> consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, addrs=ArrayList
> [10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,  
> ignite-1.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500],
> discPort=47500, order=0, intOrder=388, lastExchangeTime=1658033499392,
> loc=false, ver=2.12.0#20220108-sha1:b1289f75,
> isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:39.52124616Z"}  
>  
> Note that they do not go down  
> simultaneously, ignite-1 goes down, restarts, ignite-0 sees fresh node and
> makes connection. Moreover ignite-0 itself restarts significantly earlier at  
> 2022-07-16 21:51:30 with the same sequence of events. And also has no any
> errors except:  
> {"log":"Killed\n","stream":"stderr","time":"2022-07-16T21:51:54.018444887Z"}  
> just before restart.  
>  
> Corrupted erros started to be raised since  
> 2022-07-17 05:37:19 on ignite-1 with the same  
> id=7776249c and uptime=00:46:00.268:  
>  
> {"log":"[05:37:18,805][SEVERE][client-
> connector-#436][ClientListenerNioListener] Failed to process client request
> [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@6e7493d,  
> msg=class o.a.i.i.processors.cache.CacheInvalidStateException: Failed to
> execute query because cache partition has been lostParts
> [cacheName=PipelineConfig,
> part=0]]\n","stream":"stderr","time":"2022-07-17T05:37:18.807327604Z"}  
>  
> On ignite-0 (which restarted earlier) corrupted errors appear earlier at
> 2022-07-17 03:41:09 with  
> id=6daf16ec and uptime=05:48:01.522 without any reason  
> and at the same time there are no erros on ignite-1.  
>  
> Honestly I don't understand what is going on. Long story short that's what
> happens:  
>  
> 2022-07-16 21:51:30 ignite-0 restarts  
>  
> 2022-07-17 03:41:09 ignite-0 has corrupted errors without any beforehabd
> events or stacktraces and ignite-1  
> lives happily without errors  
>  
> 2022-07-17 04:51:01 ignite-1 restarts  
>  
> 2022-07-17  
> 05:37:19 ignite-1 starts to have corrupted errors  
>  
> What causes ignite-0 restart - I don't know :( Everything seems fine and I
> can't find any events on k8s cluster. Also the same question remains - why
> persistent data is lost when consistent ID is the same, folders of disk are
> the same, and this happens not  
> immedeately but after several hours.  
>  
> Please help 🙏  
>  
> Best regards,  
>  
> Rose.  
>  
> From: Данилов Семён <sa...@yandex.ru>  
>  
> Sent: Tuesday, July 19, 2022 10:44:22 AM  
>  
> To: user@ignite.apache.org  
>  
> Subject: Re: How to fix lost partitions gracefully?  
>  
> Hello Роза,  
>  
> You're right, having persistence enabled should prevent the cluster losing
> partitions, given that all nodes are online of course. So if a node (or
> whole cluster goes down), they should not be lost after the restart.  
>  
> I have a couple of questions:  
>  
> 1\. Do I understand correctly that you observe server nodes go down and k8s
> recreate them?  
>  
> 2\. Can you provide your cluster configuration?  
>  
> 3\. Can you check that nodes that are started are the same nodes that went
> down? (Re-)started node should have the same consistent id as the node that
> went down. If it doesn't, then it's a brand new node with no persistence.  
>  
> Regards, Semyon.  
>  
>> Hi Stephen!  
>  
>>  
>  
>> Thank for your reply!  
>  
>>  
>  
>> 2\. Well, that's the problem - I can't figure out why all server nodes go
down. Nobody uses this cluster except my two apps with clients nodes. And
nothing happens before unexpected shutdown and recreation of server pods. k8s
cluster seems fine as well.  
>  
>>  
>  
>> 3\. Also I have persistence enabled (with saving data on disk in k8s single
node). Why when server-pods are recreated they can't restore their caches from
persistence automatically? I thought this is the main goal of persistence - to
save data.  
>  
>> 4\. Unfortunately resetting partitions didn't help :( Control script return
0 exit code but it was still impossible to retrieve data from corrupted cache
(same error). So I deleted cache data, redeploy the whole Ignite cluster and
now everything works fine.  
>  
>> But it's very costly to do this every time when Ignite server nodes are
recreated which shouldn't be "stop-the-world" problem as data is saved.  
>  
>>  
>  
>> 5\. I guess that backuping partitions will not help as both nodes went
shutdown at the same time. It seems for me then that all partitions will be
lost including those that were back-upped.  
>  
>>  
>  
>> Best regards,  
>  
>>  
>  
>> Rose.  
>  
>>  
>  
>> From: Stephen Darlington <st...@gridgain.com>  
>  
>>  
>  
>> Sent: Monday, July 18, 2022 5:54 PM  
>  
>>  
>  
>> To: user  
>  
>>  
>  
>> Subject: Re: How to fix lost partitions gracefully?  
>  
>>  
>  
>> Client nodes disconnecting is not the problem here. You have  
>  
>> server nodes going down.  
>  
>>  
>  
>> Caches are split into partitions, which are then distributed across the
nodes in your cluster. If one of your data nodes goes down, and you have not
configured any backup partitions, then you will lose some partitions and the
data in them.  
>  
>>  
>  
>> There’s a script you can run to “reset lost partitions”: control-script  
>  
>>  
>  
>> Of course this does not magically bring the data back.  
>  
>>  
>  
>> You perhaps need to consider more nodes and configure your caches with at
least one backup.  
>  
>>  
>  
>>> On 18 Jul 2022, at 12:49, Айсина Роза <ra...@gpmdata.ru> wrote:  
>  
>>>  
>  
>>> Hello!  
>  
>>>  
>  
>>> We have Ignite standalone cluster in k8s environment with 2 server nodes
and several clients - Java Spring application and Spark application.  
>  
>>>  
>  
>>> Both apps raise client nodes to connect to cluster each two hours (rolling
update redeploy of both apps happens).  
>  
>>>  
>  
>>> The whole setup is in k8s in one neamespace.  
>  
>>>  
>  
>>> There is strange behavior we see sporadically after  
>  
>>> several weeks.  
>  
>>>  
>  
>>> Cache both apps using often becomes corrupted with the following
exception:  
>  
>>>  
>  
>>> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener]
Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268,
msg=class  
>  
>>> o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute
query because cache partition has been lostParts [cacheName=PipelineConfig,
part=0]]  
>  
>>>  
>  
>>> javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed
to execute query because cache partition  
>  
>>> has been lostParts [cacheName=PipelineConfig, part=0]  
>  
>>>  
>  
>>> I investigated through server logs from both Ignite nodes and found some
events that I cannot to understand.  
>  
>>>  
>  
>>> I attached logs - one with keyword = "Exception" to locate errors and the
other - original logs when first lost partitions error happens.  
>  
>>>  
>  
>>> It seems that this error is causing this behavior: Failed to shutdown
socket  
>  
>>>  
>  
>>> After this all interactions with cluster becomes impossible.  
>  
>>>  
>  
>>> Also there are so many errors like this: Client disconnected abruptly due
to network connection loss or because the connection was  
>  
>>> left open on application shutdown.  
>  
>>>  
>  
>>> So I have two questions:  
>  
>>>  
>  
>>> 2\. Can you please help to investigate the main reason for lost partitions
error and how to handle it automatically? Right now I manually redeploy the
whole cluster and then all applications connected to it which is insane and
very human-dependent.  
>  
>>> 3\. Is there any special actions I need to do to gracefully handle client
nodes when apps are going to shutdown? Is it possible that often (each 2h)
connect-then-die events from client nodes cause this behavior?  
>  
>>>  
>  
>>> Thank you in advance! Looking forward for any help! 🙏  
>  
>>  
>  
>> Информация данного сообщения содержит коммерческую тайну Общества с
ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ
Дата») и предназначена только для лиц, которым адресовано данное сообщение.
Если Вы получили данное сообщение  
>  
>> по ошибке, просим Вас удалить его и не использовать полученную информацию,
составляющую коммерческую тайну ООО «ГПМ Дата».  
>  
>>  
>  
>> В соответствии с действующим законодательством Российской Федерации ООО
«ГПМ Дата» вправе требовать от лиц, получивших доступ к информации,
составляющей коммерческую тайну, в результате действий, совершенных случайно
или по ошибке, охраны конфиденциальности  
>  
>> этой информации.  
>  
>>  
>  
>> Настоящее сообщение не является вступлением в переговоры о заключении
каких-либо договоров или соглашений, не свидетельствует о каком-либо
безусловном намерении ООО «ГПМ Дата» заключить какой-либо договор или
соглашение, не является заверением об обстоятельствах,  
>  
>> которые должны быть доведены до сведения другой стороны. Настоящее
сообщение не является офертой, акцептом оферты, равно как не является
предварительным соглашением и носит исключительно информационный и
необязывающий характер. ООО «ГПМ Дата» оставляет за  
>  
>> собой право на прекращение настоящей переписки в любое время.  
>  
> Информация данного сообщения содержит коммерческую тайну Общества с
> ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО
> «ГПМ Дата») и предназначена только для лиц, которым адресовано данное
> сообщение. Если Вы получили данное сообщение  
> по ошибке, просим Вас удалить его и не использовать полученную информацию,
> составляющую коммерческую тайну ООО «ГПМ Дата».  
>  
> В соответствии с действующим законодательством Российской Федерации ООО «ГПМ
> Дата» вправе требовать от лиц, получивших доступ к информации, составляющей
> коммерческую тайну, в результате действий, совершенных случайно или по
> ошибке, охраны конфиденциальности  
> этой информации.  
>  
> Настоящее сообщение не является вступлением в переговоры о заключении каких-
> либо договоров или соглашений, не свидетельствует о каком-либо безусловном
> намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не
> является заверением об обстоятельствах,  
> которые должны быть доведены до сведения другой стороны. Настоящее сообщение
> не является офертой, акцептом оферты, равно как не является предварительным
> соглашением и носит исключительно информационный и необязывающий характер.
> ООО «ГПМ Дата» оставляет за  
> собой право на прекращение настоящей переписки в любое время.  

* * *

Информация данного сообщения содержит коммерческую тайну Общества с
ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ
Дата») и предназначена только для лиц, которым адресовано данное сообщение.
Если Вы получили данное сообщение по ошибке, просим Вас удалить его и не
использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ
Дата».

В соответствии с действующим законодательством Российской Федерации ООО «ГПМ
Дата» вправе требовать от лиц, получивших доступ к информации, составляющей
коммерческую тайну, в результате действий, совершенных случайно или по ошибке,
охраны конфиденциальности этой информации.

Настоящее сообщение не является вступлением в переговоры о заключении каких-
либо договоров или соглашений, не свидетельствует о каком-либо безусловном
намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не
является заверением об обстоятельствах, которые должны быть доведены до
сведения другой стороны. Настоящее сообщение не является офертой, акцептом
оферты, равно как не является предварительным соглашением и носит
исключительно информационный и необязывающий характер. ООО «ГПМ Дата»
оставляет за собой право на прекращение настоящей переписки в любое время.

Re: How to fix lost partitions gracefully?

Posted by Данилов Семён <sa...@yandex.ru>.

Hello, Rose!

It sure is hard to figure out why pod restarts, but "Killed" in the logs makes me think that it might be due to out of memory killer.
You can try searching for oom killer in journalctl (or which one log utility you have on your particular system).

As for lost partitions, could you please elaborate on how you used control.sh to reset lost partitions? I tried reproducing your scenario (but without OOM) and I got it to work with reset lost partitions command.
Sadly at this moment, Ignite doesn't reset lost partitions automatically, but we do have a ticket addressing this issue https://issues.apache.org/jira/browse/IGNITE-15653.

Kind regards,
Semyon.


> Hello Семён!
> 
> Thank you for your reply!
> 
> 2. Yeah, at some point all nodes go down and, as they are created by StatefulSet, k8s recreates each with the same pod name (ignite-0, ignite-1). I can be sure because number of restarts changes from 0 to 1. I just fully repeated deployment instructions from
> official docs (how to deploy Ignite on k8s).
> 3. I attached the cluster configuration to this email. Persistence is enabled and files are stored on disk of k8s node (by nodeSelectors I make sure that all pods look into the same folder on disk).
> 4. I am not sure which ID you mean.
> 
> What I see in Grafana logs (sorry, a lot of text below):
> 
> ignite-1: ID changes from b54b6cde to
> 7776249c when node restarts (note: no any errors before restart):
> 
> // old ID
> 
> {"log":" ^--
> Node [id=b54b6cde, uptime=17 days, 09:20:13.458]\n","stream":"stderr","time":"2022-07-17T04:50:38.730606703Z"}
> 
> // here it restarts with the same consistentId and new node ID
> 
> {"log":"[04:51:01,500][INFO][main][PdsFoldersResolver]
> Consistent ID used for local node is [932603da-955c-4b1a-8717-2ce2e875b20c]
> according to persistence data storage folders\n","stream":"stderr","time":"2022-07-17T04:51:01.500301744Z"}
> 
> // sees the second node
> 
> {"log":"[04:51:44,707][INFO][tcp-disco-msg-worker-[]-#2-#55][TcpDiscoverySpi]
> New next node [newNext=TcpDiscoveryNode [id=6daf16ec-368f-44d5-9751-fa7b440eb264, consistentId=b849e31e-152b-4f1b-afd7-3bc107181677, addrs=ArrayList [10.233.96.234, 127.0.0.1], sockAddrs=HashSet [ignite-0.ignite.rec-matchtv.svc.cluster.local/10.233.96.234:47500,
> /127.0.0.1:47500], discPort=47500, order=764, intOrder=384, lastExchangeTime=1658033499670, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:44.708116152Z"}
> 
> {"log":"\u003e\u003e\u003e
> Local node [ID=7776249C-FB90-4761-B3BD-79244C97EAB7, order=773, clientMode=false]\n","stream":"stderr","time":"2022-07-17T04:51:49.08998977Z"}
> 
> ignite-0: sees that old ignite-1 node failed, lost partitions were detected and new node was connected with the same
> consistentId:
> 
> {"log":"[04:50:54,905][WARNING][disco-event-worker-#61][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, addrs=ArrayList [10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,
> ignite-1.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1658008371506, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=false]\n","stream":"stderr","time":"2022-07-17T04:50:54.905678158Z"}
> 
> {"log":"[04:50:55,172][WARNING][sys-#3337][GridDhtPartitionTopologyImpl]
> Detected lost partitions [grp=PipelineConfig, parts=[2, 12-21, 23, ...], topVer=AffinityTopologyVersion [topVer=772, minorTopVer=0]]\n","stream":"stderr","time":"2022-07-17T04:50:55.172591305Z"}
> 
> {"log":"[04:51:39,521][INFO][tcp-disco-msg-worker-[crd]-#2-#55][TcpDiscoverySpi]
> New next node [newNext=TcpDiscoveryNode [id=7776249c-fb90-4761-b3bd-79244c97eab7, consistentId=932603da-955c-4b1a-8717-2ce2e875b20c, addrs=ArrayList [10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,
> ignite-1.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500], discPort=47500, order=0, intOrder=388, lastExchangeTime=1658033499392, loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:39.52124616Z"}
> 
> Note that they do not go down
> simultaneously, ignite-1 goes down, restarts, ignite-0 sees fresh node and makes connection. Moreover ignite-0 itself restarts significantly earlier at
> 2022-07-16 21:51:30 with the same sequence of events. And also has no any errors except:
> {"log":"Killed\n","stream":"stderr","time":"2022-07-16T21:51:54.018444887Z"}
> just before restart.
> 
> Corrupted erros started to be raised since
> 2022-07-17 05:37:19 on ignite-1 with the same
> id=7776249c and uptime=00:46:00.268:
> 
> {"log":"[05:37:18,805][SEVERE][client-connector-#436][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@6e7493d,
> msg=class o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]]\n","stream":"stderr","time":"2022-07-17T05:37:18.807327604Z"}
> 
> On ignite-0 (which restarted earlier) corrupted errors appear earlier at 2022-07-17 03:41:09 with
> id=6daf16ec and uptime=05:48:01.522 without any reason
> and at the same time there are no erros on ignite-1.
> 
> Honestly I don't understand what is going on. Long story short that's what happens:
> 
> 2022-07-16 21:51:30 ignite-0 restarts
> 
> 2022-07-17 03:41:09 ignite-0 has corrupted errors without any beforehabd events or stacktraces and ignite-1
> lives happily without errors
> 
> 2022-07-17 04:51:01 ignite-1 restarts
> 
> 2022-07-17
> 05:37:19 ignite-1 starts to have corrupted errors
> 
> What causes ignite-0 restart - I don't know :( Everything seems fine and I can't find any events on k8s cluster. Also the same question remains - why persistent data is lost when consistent ID is the same, folders of disk are the same, and this happens not
> immedeately but after several hours.
> 
> Please help 🙏
> 
> Best regards,
> 
> Rose.
> 
> From: Данилов Семён <sa...@yandex.ru>
> 
> Sent: Tuesday, July 19, 2022 10:44:22 AM
> 
> To: user@ignite.apache.org
> 
> Subject: Re: How to fix lost partitions gracefully?
> 
> Hello Роза,
> 
> You're right, having persistence enabled should prevent the cluster losing partitions, given that all nodes are online of course. So if a node (or whole cluster goes down), they should not be lost after the restart.
> 
> I have a couple of questions:
> 
> 1. Do I understand correctly that you observe server nodes go down and k8s recreate them?
> 
> 2. Can you provide your cluster configuration?
> 
> 3. Can you check that nodes that are started are the same nodes that went down? (Re-)started node should have the same consistent id as the node that went down. If it doesn't, then it's a brand new node with no persistence.
> 
> Regards, Semyon.
> 
>> Hi Stephen!
> 
>>
> 
>> Thank for your reply!
> 
>>
> 
>> 2. Well, that's the problem - I can't figure out why all server nodes go down. Nobody uses this cluster except my two apps with clients nodes. And nothing happens before unexpected shutdown and recreation of server pods. k8s cluster seems fine as well.
> 
>>
> 
>> 3. Also I have persistence enabled (with saving data on disk in k8s single node). Why when server-pods are recreated they can't restore their caches from persistence automatically? I thought this is the main goal of persistence - to save data.
> 
>> 4. Unfortunately resetting partitions didn't help :( Control script return 0 exit code but it was still impossible to retrieve data from corrupted cache (same error). So I deleted cache data, redeploy the whole Ignite cluster and now everything works fine.
> 
>> But it's very costly to do this every time when Ignite server nodes are recreated which shouldn't be "stop-the-world" problem as data is saved.
> 
>>
> 
>> 5. I guess that backuping partitions will not help as both nodes went shutdown at the same time. It seems for me then that all partitions will be lost including those that were back-upped.
> 
>>
> 
>> Best regards,
> 
>>
> 
>> Rose.
> 
>>
> 
>> From: Stephen Darlington <st...@gridgain.com>
> 
>>
> 
>> Sent: Monday, July 18, 2022 5:54 PM
> 
>>
> 
>> To: user
> 
>>
> 
>> Subject: Re: How to fix lost partitions gracefully?
> 
>>
> 
>> Client nodes disconnecting is not the problem here. You have
> 
>> server nodes going down.
> 
>>
> 
>> Caches are split into partitions, which are then distributed across the nodes in your cluster. If one of your data nodes goes down, and you have not configured any backup partitions, then you will lose some partitions and the data in them.
> 
>>
> 
>> There’s a script you can run to “reset lost partitions”: control-script
> 
>>
> 
>> Of course this does not magically bring the data back.
> 
>>
> 
>> You perhaps need to consider more nodes and configure your caches with at least one backup.
> 
>>
> 
>>> On 18 Jul 2022, at 12:49, Айсина Роза <ra...@gpmdata.ru> wrote:
> 
>>>
> 
>>> Hello!
> 
>>>
> 
>>> We have Ignite standalone cluster in k8s environment with 2 server nodes and several clients - Java Spring application and Spark application.
> 
>>>
> 
>>> Both apps raise client nodes to connect to cluster each two hours (rolling update redeploy of both apps happens).
> 
>>>
> 
>>> The whole setup is in k8s in one neamespace.
> 
>>>
> 
>>> There is strange behavior we see sporadically after
> 
>>> several weeks.
> 
>>>
> 
>>> Cache both apps using often becomes corrupted with the following exception:
> 
>>>
> 
>>> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268, msg=class
> 
>>> o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]]
> 
>>>
> 
>>> javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition
> 
>>> has been lostParts [cacheName=PipelineConfig, part=0]
> 
>>>
> 
>>> I investigated through server logs from both Ignite nodes and found some events that I cannot to understand.
> 
>>>
> 
>>> I attached logs - one with keyword = "Exception" to locate errors and the other - original logs when first lost partitions error happens.
> 
>>>
> 
>>> It seems that this error is causing this behavior: Failed to shutdown socket
> 
>>>
> 
>>> After this all interactions with cluster becomes impossible.
> 
>>>
> 
>>> Also there are so many errors like this: Client disconnected abruptly due to network connection loss or because the connection was
> 
>>> left open on application shutdown.
> 
>>>
> 
>>> So I have two questions:
> 
>>>
> 
>>> 2. Can you please help to investigate the main reason for lost partitions error and how to handle it automatically? Right now I manually redeploy the whole cluster and then all applications connected to it which is insane and very human-dependent.
> 
>>> 3. Is there any special actions I need to do to gracefully handle client nodes when apps are going to shutdown? Is it possible that often (each 2h) connect-then-die events from client nodes cause this behavior?
> 
>>>
> 
>>> Thank you in advance! Looking forward for any help! 🙏
> 
>>
> 
>> Информация данного сообщения содержит коммерческую тайну Общества с ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ Дата») и предназначена только для лиц, которым адресовано данное сообщение. Если Вы получили данное сообщение
> 
>> по ошибке, просим Вас удалить его и не использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ Дата».
> 
>>
> 
>> В соответствии с действующим законодательством Российской Федерации ООО «ГПМ Дата» вправе требовать от лиц, получивших доступ к информации, составляющей коммерческую тайну, в результате действий, совершенных случайно или по ошибке, охраны конфиденциальности
> 
>> этой информации.
> 
>>
> 
>> Настоящее сообщение не является вступлением в переговоры о заключении каких-либо договоров или соглашений, не свидетельствует о каком-либо безусловном намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не является заверением об обстоятельствах,
> 
>> которые должны быть доведены до сведения другой стороны. Настоящее сообщение не является офертой, акцептом оферты, равно как не является предварительным соглашением и носит исключительно информационный и необязывающий характер. ООО «ГПМ Дата» оставляет за
> 
>> собой право на прекращение настоящей переписки в любое время.
> 
> Информация данного сообщения содержит коммерческую тайну Общества с ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ Дата») и предназначена только для лиц, которым адресовано данное сообщение. Если Вы получили данное сообщение
> по ошибке, просим Вас удалить его и не использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ Дата».
> 
> В соответствии с действующим законодательством Российской Федерации ООО «ГПМ Дата» вправе требовать от лиц, получивших доступ к информации, составляющей коммерческую тайну, в результате действий, совершенных случайно или по ошибке, охраны конфиденциальности
> этой информации.
> 
> Настоящее сообщение не является вступлением в переговоры о заключении каких-либо договоров или соглашений, не свидетельствует о каком-либо безусловном намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не является заверением об обстоятельствах,
> которые должны быть доведены до сведения другой стороны. Настоящее сообщение не является офертой, акцептом оферты, равно как не является предварительным соглашением и носит исключительно информационный и необязывающий характер. ООО «ГПМ Дата» оставляет за
> собой право на прекращение настоящей переписки в любое время.

Re: How to fix lost partitions gracefully?

Posted by Айсина Роза <ra...@gpmdata.ru>.

Hello Семён!  

  

Thank you for your reply!

  

  1. Yeah, at some point all nodes go down and, as they are created by StatefulSet, k8s recreates each with the same pod name (ignite-0, ignite-1). I can be sure because number of restarts changes from 0 to 1. I just fully repeated deployment instructions from official docs (how to deploy Ignite on k8s).
  2. I attached the cluster configuration to this email. Persistence is enabled and files are stored on disk of k8s node (by nodeSelectors I make sure that all pods look into the same folder on disk).
  3. I am not sure which ID you mean.   

  

What I see in Grafana logs (sorry, a lot of text below):  

  

**_ignite-1:_** ID changes from **b54b6cde** to **7776249c** when node
restarts (note: no any errors before restart):

  

// old ID  

{"log":" ^-- **Node [id=b54b6cde, uptime=17 days** ,
09:20:13.458]\n","stream":"stderr","time":"2022-07-17T04:50:38.730606703Z"}

  

// here it restarts with the same consistentId and new node ID  

{"log":"[04:51:01,500][INFO][main][PdsFoldersResolver] **Consistent ID used
for local node is [932603da-955c-4b1a-8717-2ce2e875b20c]** according to
persistence data storage
folders\n","stream":"stderr","time":"2022-07-17T04:51:01.500301744Z"}

  

// sees the second node  

{"log":"[04:51:44,707][INFO][tcp-disco-msg-worker-[]-#2-#55][TcpDiscoverySpi]
**New next node [newNext=TcpDiscoveryNode
[id=6daf16ec-368f-44d5-9751-fa7b440eb264,
consistentId=b849e31e-152b-4f1b-afd7-3bc107181677** , addrs=ArrayList
[10.233.96.234, 127.0.0.1], sockAddrs=HashSet [ **ignite-0**.ignite.rec-
matchtv.svc.cluster.local/10.233.96.234:47500, /127.0.0.1:47500],
discPort=47500, order=764, intOrder=384, lastExchangeTime=1658033499670,
loc=false, ver=2.12.0#20220108-sha1:b1289f75,
isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:44.708116152Z"}

  

{"log":"\u003e\u003e\u003e **Local node
[ID=7776249C-FB90-4761-B3BD-79244C97EAB7** , order=773,
clientMode=false]\n","stream":"stderr","time":"2022-07-17T04:51:49.08998977Z"}

  

  

**_ignite-0:_** sees that old ignite-1 node failed, lost partitions were
detected and new node was connected with the same consistentId:

  

{"log":"[04:50:54,905][WARNING][disco-event-worker-#61][GridDiscoveryManager]
**Node FAILED: TcpDiscoveryNode [id=b54b6cde-9d34-473f-afdf-9f0a7d843e2f,
consistentId=932603da-955c-4b1a-8717-2ce2e875b20c** , addrs=ArrayList
[10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,
**ignite-1**.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500],
discPort=47500, order=2, intOrder=2, lastExchangeTime=1658008371506,
loc=false, ver=2.12.0#20220108-sha1:b1289f75,
isClient=false]\n","stream":"stderr","time":"2022-07-17T04:50:54.905678158Z"}

  

{"log":"[04:50:55,172][WARNING][sys-#3337][GridDhtPartitionTopologyImpl]
**Detected lost partitions** [grp=PipelineConfig, parts=[2, 12-21, 23, ...],
topVer=AffinityTopologyVersion [topVer=772,
minorTopVer=0]]\n","stream":"stderr","time":"2022-07-17T04:50:55.172591305Z"}

  

{"log":"[04:51:39,521][INFO][tcp-disco-msg-
worker-[crd]-#2-#55][TcpDiscoverySpi] **New next node
[newNext=TcpDiscoveryNode [id=7776249c-fb90-4761-b3bd-79244c97eab7,
consistentId=932603da-955c-4b1a-8717-2ce2e875b20c** , addrs=ArrayList
[10.233.96.120, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500,
**ignite-1**.ignite.rec-matchtv.svc.cluster.local/10.233.96.120:47500],
discPort=47500, order=0, intOrder=388, lastExchangeTime=1658033499392,
loc=false, ver=2.12.0#20220108-sha1:b1289f75,
isClient=false]]\n","stream":"stderr","time":"2022-07-17T04:51:39.52124616Z"}

  

Note that they do not go down simultaneously, ignite-1 goes down, restarts,
ignite-0 sees fresh node and makes connection. Moreover ignite-0 itself
restarts significantly earlier at 2022-07-16 21:51:30 with the same sequence
of events. And also has no any errors except:

{"log":" **Killed**
\n","stream":"stderr","time":"2022-07-16T21:51:54.018444887Z"} just before
restart.

  

  

Corrupted erros started to be raised since 2022-07-17 05:37:19 on ignite-1
with the same  id=7776249c and uptime=00:46:00.268:

  

{"log":"[05:37:18,805][SEVERE][client-
connector-#436][ClientListenerNioListener] Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@6e7493d,
msg=class o.a.i.i.processors.cache.CacheInvalidStateException: **Failed to
execute query because cache partition has been lostParts
[cacheName=PipelineConfig, part=0]]\n
","stream":"stderr","time":"2022-07-17T05:37:18.807327604Z"}**

  

On ignite-0 (which restarted earlier) corrupted errors appear earlier at
2022-07-17 03:41:09 with id=6daf16ec and uptime=05:48:01.522 without any
reason **and at the same time there are no erros on ignite-1.**  

  

Honestly I don't understand what is going on. Long story short that's what
happens:  

  

2022-07-16 21:51:30 ignite-0 restarts

2022-07-17 03:41:09 ignite-0 has corrupted errors without any beforehabd
events or stacktraces and ignite-1 lives happily without errors  

2022-07-17 04:51:01 ignite-1 restarts

2022-07-17 05:37:19 ignite-1 starts to have corrupted errors  

  

What causes ignite-0 restart - I don't know :( Everything seems fine and I
can't find any events on k8s cluster. Also the same question remains - why
persistent data is lost when consistent ID is the same, folders of disk are
the same, and this happens not immedeately but after several hours.  

  

Please help 🙏  

  

Best regards,  

Rose.  

* * *

**From:** Данилов Семён <sa...@yandex.ru>  
**Sent:** Tuesday, July 19, 2022 10:44:22 AM  
**To:** user@ignite.apache.org  
**Subject:** Re: How to fix lost partitions gracefully?



Hello Роза,  
  
You're right, having persistence enabled should prevent the cluster losing
partitions, given that all nodes are online of course. So if a node (or whole
cluster goes down), they should not be lost after the restart.  
I have a couple of questions:  
1\. Do I understand correctly that you observe server nodes go down and k8s
recreate them?  
2\. Can you provide your cluster configuration?  
3\. Can you check that nodes that are started are the same nodes that went
down? (Re-)started node should have the same consistent id as the node that
went down. If it doesn't, then it's a brand new node with no persistence.  
  
Regards, Semyon.  
  
> Hi Stephen!  
>  
> Thank for your reply!  
>  
> 2\. Well, that's the problem - I can't figure out why all server nodes go
> down. Nobody uses this cluster except my two apps with clients nodes. And
> nothing happens before unexpected shutdown and recreation of server pods.
> k8s cluster seems fine as well.  
>  
> 3\. Also I have persistence enabled (with saving data on disk in k8s single
> node). Why when server-pods are recreated they can't restore their caches
> from persistence automatically? I thought this is the main goal of
> persistence - to save data.  
> 4\. Unfortunately resetting partitions didn't help :( Control script return
> 0 exit code but it was still impossible to retrieve data from corrupted
> cache (same error). So I deleted cache data, redeploy the whole Ignite
> cluster and now everything works fine.  
> But it's very costly to do this every time when Ignite server nodes are
> recreated which shouldn't be "stop-the-world" problem as data is saved.  
>  
> 5\. I guess that backuping partitions will not help as both nodes went
> shutdown at the same time. It seems for me then that all partitions will be
> lost including those that were back-upped.  
>  
> Best regards,  
>  
> Rose.  
>  
> From: Stephen Darlington <st...@gridgain.com>  
>  
> Sent: Monday, July 18, 2022 5:54 PM  
>  
> To: user  
>  
> Subject: Re: How to fix lost partitions gracefully?  
>  
> Client nodes disconnecting is not the problem here. You have  
> server nodes going down.  
>  
> Caches are split into partitions, which are then distributed across the
> nodes in your cluster. If one of your data nodes goes down, and you have not
> configured any backup partitions, then you will lose some partitions and the
> data in them.  
>  
> There’s a script you can run to “reset lost partitions”: control-script  
>  
> Of course this does not magically bring the data back.  
>  
> You perhaps need to consider more nodes and configure your caches with at
> least one backup.  
>  
>> On 18 Jul 2022, at 12:49, Айсина Роза <ra...@gpmdata.ru> wrote:  
>>  
>> Hello!  
>>  
>> We have Ignite standalone cluster in k8s environment with 2 server nodes
and several clients - Java Spring application and Spark application.  
>>  
>> Both apps raise client nodes to connect to cluster each two hours (rolling
update redeploy of both apps happens).  
>>  
>> The whole setup is in k8s in one neamespace.  
>>  
>> There is strange behavior we see sporadically after  
>> several weeks.  
>>  
>> Cache both apps using often becomes corrupted with the following exception:  
>>  
>> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener]
Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268,
msg=class  
>> o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute
query because cache partition has been lostParts [cacheName=PipelineConfig,
part=0]]  
>>  
>> javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed
to execute query because cache partition  
>> has been lostParts [cacheName=PipelineConfig, part=0]  
>>  
>> I investigated through server logs from both Ignite nodes and found some
events that I cannot to understand.  
>>  
>> I attached logs - one with keyword = "Exception" to locate errors and the
other - original logs when first lost partitions error happens.  
>>  
>> It seems that this error is causing this behavior: Failed to shutdown
socket  
>>  
>> After this all interactions with cluster becomes impossible.  
>>  
>> Also there are so many errors like this: Client disconnected abruptly due
to network connection loss or because the connection was  
>> left open on application shutdown.  
>>  
>> So I have two questions:  
>>  
>> 2\. Can you please help to investigate the main reason for lost partitions
error and how to handle it automatically? Right now I manually redeploy the
whole cluster and then all applications connected to it which is insane and
very human-dependent.  
>> 3\. Is there any special actions I need to do to gracefully handle client
nodes when apps are going to shutdown? Is it possible that often (each 2h)
connect-then-die events from client nodes cause this behavior?  
>>  
>> Thank you in advance! Looking forward for any help! 🙏  
>  
> Информация данного сообщения содержит коммерческую тайну Общества с
> ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО
> «ГПМ Дата») и предназначена только для лиц, которым адресовано данное
> сообщение. Если Вы получили данное сообщение  
> по ошибке, просим Вас удалить его и не использовать полученную информацию,
> составляющую коммерческую тайну ООО «ГПМ Дата».  
>  
> В соответствии с действующим законодательством Российской Федерации ООО «ГПМ
> Дата» вправе требовать от лиц, получивших доступ к информации, составляющей
> коммерческую тайну, в результате действий, совершенных случайно или по
> ошибке, охраны конфиденциальности  
> этой информации.  
>  
> Настоящее сообщение не является вступлением в переговоры о заключении каких-
> либо договоров или соглашений, не свидетельствует о каком-либо безусловном
> намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не
> является заверением об обстоятельствах,  
> которые должны быть доведены до сведения другой стороны. Настоящее сообщение
> не является офертой, акцептом оферты, равно как не является предварительным
> соглашением и носит исключительно информационный и необязывающий характер.
> ООО «ГПМ Дата» оставляет за  
> собой право на прекращение настоящей переписки в любое время.  

* * *

Информация данного сообщения содержит коммерческую тайну Общества с
ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ
Дата») и предназначена только для лиц, которым адресовано данное сообщение.
Если Вы получили данное сообщение по ошибке, просим Вас удалить его и не
использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ
Дата».

В соответствии с действующим законодательством Российской Федерации ООО «ГПМ
Дата» вправе требовать от лиц, получивших доступ к информации, составляющей
коммерческую тайну, в результате действий, совершенных случайно или по ошибке,
охраны конфиденциальности этой информации.

Настоящее сообщение не является вступлением в переговоры о заключении каких-
либо договоров или соглашений, не свидетельствует о каком-либо безусловном
намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не
является заверением об обстоятельствах, которые должны быть доведены до
сведения другой стороны. Настоящее сообщение не является офертой, акцептом
оферты, равно как не является предварительным соглашением и носит
исключительно информационный и необязывающий характер. ООО «ГПМ Дата»
оставляет за собой право на прекращение настоящей переписки в любое время.

Re: How to fix lost partitions gracefully?

Posted by Данилов Семён <sa...@yandex.ru>.

Hello Роза,

You're right, having persistence enabled should prevent the cluster losing partitions, given that all nodes are online of course. So if a node (or whole cluster goes down), they should not be lost after the restart.
I have a couple of questions:
1. Do I understand correctly that you observe server nodes go down and k8s recreate them?
2. Can you provide your cluster configuration?
3. Can you check that nodes that are started are the same nodes that went down? (Re-)started node should have the same consistent id as the node that went down. If it doesn't, then it's a brand new node with no persistence.

Regards, Semyon. 

> Hi Stephen!
> 
> Thank for your reply!
> 
> 2. Well, that's the problem - I can't figure out why all server nodes go down. Nobody uses this cluster except my two apps with clients nodes. And nothing happens before unexpected shutdown and recreation of server pods. k8s cluster seems fine as well.
> 
> 3. Also I have persistence enabled (with saving data on disk in k8s single node). Why when server-pods are recreated they can't restore their caches from persistence automatically? I thought this is the main goal of persistence - to save data.
> 4. Unfortunately resetting partitions didn't help :( Control script return 0 exit code but it was still impossible to retrieve data from corrupted cache (same error). So I deleted cache data, redeploy the whole Ignite cluster and now everything works fine.
> But it's very costly to do this every time when Ignite server nodes are recreated which shouldn't be "stop-the-world" problem as data is saved.
> 
> 5. I guess that backuping partitions will not help as both nodes went shutdown at the same time. It seems for me then that all partitions will be lost including those that were back-upped.
> 
> Best regards,
> 
> Rose.
> 
> From: Stephen Darlington <st...@gridgain.com>
> 
> Sent: Monday, July 18, 2022 5:54 PM
> 
> To: user
> 
> Subject: Re: How to fix lost partitions gracefully?
> 
> Client nodes disconnecting is not the problem here. You have
> server nodes going down.
> 
> Caches are split into partitions, which are then distributed across the nodes in your cluster. If one of your data nodes goes down, and you have not configured any backup partitions, then you will lose some partitions and the data in them.
> 
> There’s a script you can run to “reset lost partitions”: control-script
> 
> Of course this does not magically bring the data back.
> 
> You perhaps need to consider more nodes and configure your caches with at least one backup.
> 
>> On 18 Jul 2022, at 12:49, Айсина Роза <ra...@gpmdata.ru> wrote:
>>
>> Hello!
>>
>> We have Ignite standalone cluster in k8s environment with 2 server nodes and several clients - Java Spring application and Spark application.
>>
>> Both apps raise client nodes to connect to cluster each two hours (rolling update redeploy of both apps happens).
>>
>> The whole setup is in k8s in one neamespace.
>>
>> There is strange behavior we see sporadically after
>> several weeks.
>>
>> Cache both apps using often becomes corrupted with the following exception:
>>
>> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268, msg=class
>> o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]]
>>
>> javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition
>> has been lostParts [cacheName=PipelineConfig, part=0]
>>
>> I investigated through server logs from both Ignite nodes and found some events that I cannot to understand.
>>
>> I attached logs - one with keyword = "Exception" to locate errors and the other - original logs when first lost partitions error happens.
>>
>> It seems that this error is causing this behavior: Failed to shutdown socket
>>
>> After this all interactions with cluster becomes impossible.
>>
>> Also there are so many errors like this: Client disconnected abruptly due to network connection loss or because the connection was
>> left open on application shutdown.
>>
>> So I have two questions:
>>
>> 2. Can you please help to investigate the main reason for lost partitions error and how to handle it automatically? Right now I manually redeploy the whole cluster and then all applications connected to it which is insane and very human-dependent.
>> 3. Is there any special actions I need to do to gracefully handle client nodes when apps are going to shutdown? Is it possible that often (each 2h) connect-then-die events from client nodes cause this behavior?
>>
>> Thank you in advance! Looking forward for any help! 🙏
> 
> Информация данного сообщения содержит коммерческую тайну Общества с ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ Дата») и предназначена только для лиц, которым адресовано данное сообщение. Если Вы получили данное сообщение
> по ошибке, просим Вас удалить его и не использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ Дата».
> 
> В соответствии с действующим законодательством Российской Федерации ООО «ГПМ Дата» вправе требовать от лиц, получивших доступ к информации, составляющей коммерческую тайну, в результате действий, совершенных случайно или по ошибке, охраны конфиденциальности
> этой информации.
> 
> Настоящее сообщение не является вступлением в переговоры о заключении каких-либо договоров или соглашений, не свидетельствует о каком-либо безусловном намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не является заверением об обстоятельствах,
> которые должны быть доведены до сведения другой стороны. Настоящее сообщение не является офертой, акцептом оферты, равно как не является предварительным соглашением и носит исключительно информационный и необязывающий характер. ООО «ГПМ Дата» оставляет за
> собой право на прекращение настоящей переписки в любое время.

Re: How to fix lost partitions gracefully?

Posted by Айсина Роза <ra...@gpmdata.ru>.

Hi Stephen!

Thank for your reply!

  1. Well, that's the problem - I can't figure out why _all_ server nodes go down. Nobody uses this cluster except my two apps with clients nodes. And nothing happens before unexpected shutdown and recreation of server pods. k8s cluster seems fine as well.  

  2. Also I have persistence enabled (with saving data on disk in k8s single node). Why when server-pods are recreated they can't restore their caches from persistence automatically? I thought this is the main goal of persistence - to save data.
  3. Unfortunately resetting partitions didn't help :( Control script return 0 exit code but it was still impossible to retrieve data from corrupted cache (same error). So I deleted cache data, redeploy the whole Ignite cluster and now everything works fine. But it's very costly to do this every time when Ignite server nodes are recreated which shouldn't be "stop-the-world" problem as data is saved.   

  4. I guess that backuping partitions will not help as _both_ nodes went shutdown at the same time. It seems for me then that all partitions will be lost including those that were back-upped.   

Best regards,  

Rose.  

* * *

**From:** Stephen Darlington <st...@gridgain.com>  
**Sent:** Monday, July 18, 2022 5:54 PM  
**To:** user  
**Subject:** Re: How to fix lost partitions gracefully?

Client nodes disconnecting is not the problem here. You have _server_  nodes
going down.

Caches are split into partitions, which are then distributed across the nodes
in your cluster. If one of your data nodes goes down, and you have not
configured any backup partitions, then you will lose some partitions and the
data in them.

There’s a script you can run to “reset lost partitions”: [control-
script](https://ignite.apache.org/docs/2.11.1/tools/control-script#resetting-
lost-partitions)

Of course this does not magically bring the data back.

You perhaps need to consider more nodes and configure your caches with at
least one backup.  

> On 18 Jul 2022, at 12:49, Айсина Роза
> <[raysina@gpmdata.ru](mailto:raysina@gpmdata.ru)> wrote:
>
>  
>
>
> Hello!  
>
>
>  
>
>
> We have Ignite standalone cluster in k8s environment with 2 server nodes and
> several clients - Java Spring application and Spark application.
>
> Both apps raise client nodes to connect to cluster each two hours (rolling
> update redeploy of both apps happens).
>
> The whole setup is in k8s in one neamespace.  
>
>
>  
>
>
> There is strange behavior we see sporadically after several weeks.
>
> Cache both apps using often becomes corrupted with the following exception:  
>
>
>  
>
>
> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener]
> Failed to process client request
> [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268,
> msg= **class o.a.i.i.processors.cache.CacheInvalidStateException: Failed to
> execute query because cache partition has been lostParts
> **[cacheName=PipelineConfig, part=0]]  
>
>
> javax.cache.CacheException: class
> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
> **Failed to execute query because cache partition has been lostParts
> [cacheName=PipelineConfig, part=0]**
>
>  
>
>
> I investigated through server logs from both Ignite nodes and found some
> events that I cannot to understand.  
>
>
> I attached logs - one with keyword = "Exception" to locate errors and the
> other - original logs when first lost partitions error happens.
>
>  
>
>
> It seems that this error is causing this behavior:  **Failed to shutdown
> socket  
>  **
>
> After this all interactions with cluster becomes impossible.  
>
>
> Also there are so many errors like this:  **Client disconnected abruptly due
> to network connection loss or because the connection was left open on
> application shutdown.  
>  **
>
> **  
> **
>
> So I have two questions:  
>
>
>   1. Can you please help to investigate the main reason for lost partitions
> error and how to handle it automatically? Right now I manually redeploy the
> whole cluster and then all applications connected to it which is insane and
> very human-dependent.
>   2. Is there any special actions I need to do to gracefully handle client
> nodes when apps are going to shutdown? Is it possible that often (each 2h)
> connect-then-die events from client nodes cause this behavior?  
>
>

>
>  
>
>
> Thank you in advance! Looking forward for any help! 🙏  
>
>
> **  
> **

* * *

Информация данного сообщения содержит коммерческую тайну Общества с
ограниченной ответственностью «ГПМ Дата», ОГРН 1207700499863 (далее – ООО «ГПМ
Дата») и предназначена только для лиц, которым адресовано данное сообщение.
Если Вы получили данное сообщение по ошибке, просим Вас удалить его и не
использовать полученную информацию, составляющую коммерческую тайну ООО «ГПМ
Дата».

В соответствии с действующим законодательством Российской Федерации ООО «ГПМ
Дата» вправе требовать от лиц, получивших доступ к информации, составляющей
коммерческую тайну, в результате действий, совершенных случайно или по ошибке,
охраны конфиденциальности этой информации.

Настоящее сообщение не является вступлением в переговоры о заключении каких-
либо договоров или соглашений, не свидетельствует о каком-либо безусловном
намерении ООО «ГПМ Дата» заключить какой-либо договор или соглашение, не
является заверением об обстоятельствах, которые должны быть доведены до
сведения другой стороны. Настоящее сообщение не является офертой, акцептом
оферты, равно как не является предварительным соглашением и носит
исключительно информационный и необязывающий характер. ООО «ГПМ Дата»
оставляет за собой право на прекращение настоящей переписки в любое время.

Re: How to fix lost partitions gracefully?

Posted by Stephen Darlington <st...@gridgain.com>.

Client nodes disconnecting is not the problem here. You have server nodes going down.

Caches are split into partitions, which are then distributed across the nodes in your cluster. If one of your data nodes goes down, and you have not configured any backup partitions, then you will lose some partitions and the data in them.

There’s a script you can run to “reset lost partitions”: control-script <https://ignite.apache.org/docs/2.11.1/tools/control-script#resetting-lost-partitions>

Of course this does not magically bring the data back.

You perhaps need to consider more nodes and configure your caches with at least one backup.

> On 18 Jul 2022, at 12:49, Айсина Роза <ra...@gpmdata.ru> wrote:
> 
> Hello! 
> 
> We have Ignite standalone cluster in k8s environment with 2 server nodes and several clients - Java Spring application and Spark application.
> Both apps raise client nodes to connect to cluster each two hours (rolling update redeploy of both apps happens).
> The whole setup is in k8s in one neamespace. 
> 
> There is strange behavior we see sporadically after several weeks.
> Cache both apps using often becomes corrupted with the following exception: 
> 
> [10:57:43,951][SEVERE][client-connector-#2796][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCacheScanQueryRequest@78481268, msg=class o.a.i.i.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]]
> javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=PipelineConfig, part=0]
> 
> I investigated through server logs from both Ignite nodes and found some events that I cannot to understand. 
> I attached logs - one with keyword = "Exception" to locate errors and the other - original logs when first lost partitions error happens.
> 
> It seems that this error is causing this behavior: Failed to shutdown socket
> After this all interactions with cluster becomes impossible. 
> Also there are so many errors like this: Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown.
> 
> So I have two questions: 
> Can you please help to investigate the main reason for lost partitions error and how to handle it automatically? Right now I manually redeploy the whole cluster and then all applications connected to it which is insane and very human-dependent.
> Is there any special actions I need to do to gracefully handle client nodes when apps are going to shutdown? Is it possible that often (each 2h) connect-then-die events from client nodes cause this behavior? 
> 
> Thank you in advance! Looking forward for any help! 🙏
>