You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Hemasundara Rao <he...@travelcentrictechnology.com> on 2018/11/23 04:41:48 UTC

Ignite cluster going down frequently

Hi All,
We are running two node ignite server cluster.
It was running without any issue for almost 5 days. We are using this grid
for static data. Ignite process is running with around 8GB memory after we
load our data.
Suddenly grid server nodes going down , we tried 3 times running the server
nodes and loading static data. Those server node going down again and again.

Please let us know how to overcome these kind of issue.

Attache the log file and configuration file.

*Following Is the part of log from on server : *

[04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Node is out of topology (probably, due to short-time network problems).
[04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
isClient=false]
[04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
rmtPort=36695
[04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
rmtPort=58418
[04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
rmtPort=63403
[04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
[04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
time=49ms]
[04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
[04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
time=7ms]
[04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
rmtPort=48038
[04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Restarting JVM according to configured segmentation policy.
[04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
heap=84.0GB]
[04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Data Regions Configured:
[04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
 ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
persistenceEnabled=false]
[04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] Started
exchange init [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0],
crd=true, evt=NODE_FAILED, evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
customEvt=null, allowMerge=true]
[04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
Finished waiting for partition release future
[topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
futInfo=NA]
[04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
Coordinator received all messages, try merge [ver=AffinityTopologyVersion
[topVer=680, minorTopVer=0]]
[04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
Stop merge, custom task found: WalStateNodeLeaveExchangeTask
[node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
[04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
[04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
[04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
heap=78.0GB]
[04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Data Regions Configured:
[04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
 ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
persistenceEnabled=false]
[04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
[04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
heap=72.0GB]
[04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Data Regions Configured:
[04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
 ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
persistenceEnabled=false]
[04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
[04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
heap=71.0GB]

-- 
Hemasundara Rao Pottangi  | Senior Project Leader

[image: HotelHub-logo]
HotelHub LLP
Phone: +91 80 6741 8700
Cell: +91 99 4807 7054
Email: hemasundara.rao@hotelhub.com
Website: www.hotelhub.com <http://hotelhub.com/>
------------------------------

HotelHub LLP is a service provider working on behalf of Travel Centric
Technology Ltd, a company registered in the United Kingdom.
DISCLAIMER: This email message and all attachments are confidential and may
contain information that is Privileged, Confidential or exempt from
disclosure under applicable law. If you are not the intended recipient, you
are notified that any dissemination, distribution or copying of this email
is strictly prohibited. If you have received this email in error, please
notify us immediately by return email to
notices@travelcentrictechnology.com and
destroy the original message. Opinions, conclusions and other information
in this message that do not relate to the official business of Travel
Centric Technology Ltd or HotelHub LLP, shall be understood to be neither
given nor endorsed by either company.

Re: Ignite cluster going down frequently

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

[04:45:53,179][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Timed out waiting for message delivery receipt (most probably, the reason
is in long GC pauses on remote
 node; consider tuning GC and increasing 'ackTimeout' configuration
property). Will retry to send message with increased timeout
[currentTimeout=10000, rmtAddr=/10.201.30.64:47603, rmtPort=
47603]
[04:45:53,180][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage
[node=TcpDiscoveryNode [id=47aa2
976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/
10.201.30.173:0], discPort=0, order=0, intOrder=0,
lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa3
42270, isClient=true],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@6ce6ae2,
super=TcpDiscoveryAbstractMessage
[sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec5
3761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0,
pendingIdx=0, failedNodes=null, isClient=true]], next=TcpDiscoveryNode
[id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, ad
drs=[10.201.30.64], sockAddrs=[/10.201.30.64:47603], discPort=47603,
order=53, intOrder=32, lastExchangeTime=1542272829304, loc=false,
ver=2.4.0#20180305-sha1:aa342270, isClient=false], err
Msg=Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage
[node=TcpDiscoveryNode [id=47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b,
addrs=[10.201.30.173], sockAddrs=[/10.201.30.173
:0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131,
loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataP
acket@6ce6ae2, super=TcpDiscoveryAbstractMessage
[sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1,
id=5e14ec53761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null,
topVer=0, pending
Idx=0, failedNodes=null, isClient=true]], next=ClusterNode
[id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, order=53, addr=[10.201.30.64],
daemon=true]]]
[04:45:53,190][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection p
lease see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'

and then, on another node:
[04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
isClient=false]

I think that you either have long GC pauses or flaky network (or system
goes into swapping and such).

Consider increasing 'ackTimeout' and/or 'failureDetectionTimeout'. Also
consider collecting GC logs for your nodes, looking into them for a root
cause.

Regards,
-- 
Ilya Kasnacheev


пт, 30 нояб. 2018 г. в 14:01, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com>:

> Hi Ilya Kasnacheev,
>
>  I am attaching all logs from second server (10.201.30.64).
> Please let me know if you need any other details.
>
> Thanks and Regards,
> Hemasundar.
>
> On Fri, 30 Nov 2018 at 09:40, Hemasundara Rao <
> hemasundara.rao@travelcentrictechnology.com> wrote:
>
>> Hi Ilya Kasnacheev,
>>
>>   We are running one cluster node (10.201.30.63). I am attaching all logs
>> from this server.
>> Please let me know if you need any other details.
>>
>> Thanks and Regards,
>> Hemasundar.
>>
>>
>> On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev <il...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>
>>> It is not clear from this log alone why this node became segmented. Do
>>> you have log from other server node in the topology? It was coordinator so
>>> maybe it was the one experiencing problems.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
>>> hemasundara.rao@travelcentrictechnology.com>:
>>>
>>>> Hi  Ilya Kasnacheev,
>>>>
>>>>  Did you get a chance to go though the log attached?
>>>> This is one of the critical issue we are facing in our dev environment.
>>>> Your input is of great help for us if we get, what is causing this
>>>> issue and a probable solution to it.
>>>>
>>>> Thanks and Regards,
>>>> Hemasundar.
>>>>
>>>> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
>>>> hemasundara.rao@travelcentrictechnology.com> wrote:
>>>>
>>>>> Hi  Ilya Kasnacheev,
>>>>>   I have attached the log file.
>>>>>
>>>>> Regards,
>>>>> Hemasundar.
>>>>>
>>>>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <
>>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> Maybe you have some data in your caches which causes runaway heap
>>>>>> usage in your own code. Previously you did not have such data or code which
>>>>>> would react in such fashion.
>>>>>>
>>>>>> It's hard to say, can you provide more logs from the node before it
>>>>>> segments?
>>>>>>
>>>>>> Regards,
>>>>>> --
>>>>>> Ilya Kasnacheev
>>>>>>
>>>>>>
>>>>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>>
>>>>>>> Thank you very much Ilya Kasnacheev for your response.
>>>>>>>
>>>>>>> We are loading data initially, after that only small delta change
>>>>>>> will be updated.
>>>>>>> Grid down issue is happening after it is running successfully 2 to 3
>>>>>>> days.
>>>>>>> Once the issue started, it is repeating frequently and not getting
>>>>>>> any clue.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Hemasundar.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
>>>>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>>>>>> response from that node. This usually means either network problems or long
>>>>>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>>>>>
>>>>>>>> Make sure your data load process does not cause heap usage spikes.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>>
>>>>>>>>
>>>>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>> We are running two node ignite server cluster.
>>>>>>>>> It was running without any issue for almost 5 days. We are using
>>>>>>>>> this grid for static data. Ignite process is running with around 8GB memory
>>>>>>>>> after we load our data.
>>>>>>>>> Suddenly grid server nodes going down , we tried 3 times running
>>>>>>>>> the server nodes and loading static data. Those server node going down
>>>>>>>>> again and again.
>>>>>>>>>
>>>>>>>>> Please let us know how to overcome these kind of issue.
>>>>>>>>>
>>>>>>>>> Attache the log file and configuration file.
>>>>>>>>>
>>>>>>>>> *Following Is the part of log from on server : *
>>>>>>>>>
>>>>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>>>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>>>>>>> isClient=false]
>>>>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>>> 10.201.30.64:36695, rmtPort=36695
>>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>>> 10.201.30.172:58418, rmtPort=58418
>>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>>> 10.201.10.125:63403, rmtPort=63403
>>>>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>>>>>>> time=49ms]
>>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>>>>>>> time=7ms]
>>>>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>>> 10.201.30.64:48038, rmtPort=48038
>>>>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Restarting JVM according to configured segmentation policy.
>>>>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>>>>>>> heap=84.0GB]
>>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Data Regions Configured:
>>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>>> persistenceEnabled=false]
>>>>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>>>>>> allowMerge=true]
>>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>>> Finished waiting for partition release future
>>>>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>>>>>>> futInfo=NA]
>>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>>>>>> [topVer=680, minorTopVer=0]]
>>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>>>> 10.201.30.172:0], discPort=0, order=98, intOrder=53,
>>>>>>>>> lastExchangeTime=1542348596592, loc=false,
>>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>>>>>>> heap=78.0GB]
>>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Data Regions Configured:
>>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>>> persistenceEnabled=false]
>>>>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>>>> 10.201.30.172:0], discPort=0, order=129, intOrder=71,
>>>>>>>>> lastExchangeTime=1542360580600, loc=false,
>>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>>>>>>> heap=72.0GB]
>>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Data Regions Configured:
>>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>>> persistenceEnabled=false]
>>>>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/
>>>>>>>>> 10.201.50.96:0], discPort=0, order=175, intOrder=96,
>>>>>>>>> lastExchangeTime=1542365246419, loc=false,
>>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>>>>>>> heap=71.0GB]
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>>>>>
>>>>>>>>> [image: HotelHub-logo]
>>>>>>>>> HotelHub LLP
>>>>>>>>> Phone: +91 80 6741 8700
>>>>>>>>> Cell: +91 99 4807 7054
>>>>>>>>> Email: hemasundara.rao@hotelhub.com
>>>>>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>> HotelHub LLP is a service provider working on behalf of Travel
>>>>>>>>> Centric Technology Ltd, a company registered in the United Kingdom.
>>>>>>>>> DISCLAIMER: This email message and all attachments are
>>>>>>>>> confidential and may contain information that is Privileged, Confidential
>>>>>>>>> or exempt from disclosure under applicable law. If you are not the intended
>>>>>>>>> recipient, you are notified that any dissemination, distribution or copying
>>>>>>>>> of this email is strictly prohibited. If you have received this email in
>>>>>>>>> error, please notify us immediately by return email to
>>>>>>>>> notices@travelcentrictechnology.com and destroy the original
>>>>>>>>> message. Opinions, conclusions and other information in this message that
>>>>>>>>> do not relate to the official business of Travel Centric Technology Ltd or
>>>>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>>>>>> either company.
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>
>
> --
> Hemasundara Rao Pottangi  | Senior Project Leader
>
> [image: HotelHub-logo]
> HotelHub LLP
> Phone: +91 80 6741 8700
> Cell: +91 99 4807 7054
> Email: hemasundara.rao@hotelhub.com
> Website: www.hotelhub.com <http://hotelhub.com/>
> ------------------------------
>
> HotelHub LLP is a service provider working on behalf of Travel Centric
> Technology Ltd, a company registered in the United Kingdom.
> DISCLAIMER: This email message and all attachments are confidential and
> may contain information that is Privileged, Confidential or exempt from
> disclosure under applicable law. If you are not the intended recipient, you
> are notified that any dissemination, distribution or copying of this email
> is strictly prohibited. If you have received this email in error, please
> notify us immediately by return email to
> notices@travelcentrictechnology.com and destroy the original message.
> Opinions, conclusions and other information in this message that do not
> relate to the official business of Travel Centric Technology Ltd or
> HotelHub LLP, shall be understood to be neither given nor endorsed by
> either company.
>
>

Re: Ignite cluster going down frequently

Posted by Hemasundara Rao <he...@travelcentrictechnology.com>.
Hi Ilya Kasnacheev,

 I am attaching all logs from second server (10.201.30.64).
Please let me know if you need any other details.

Thanks and Regards,
Hemasundar.

On Fri, 30 Nov 2018 at 09:40, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com> wrote:

> Hi Ilya Kasnacheev,
>
>   We are running one cluster node (10.201.30.63). I am attaching all logs
> from this server.
> Please let me know if you need any other details.
>
> Thanks and Regards,
> Hemasundar.
>
>
> On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> It is not clear from this log alone why this node became segmented. Do
>> you have log from other server node in the topology? It was coordinator so
>> maybe it was the one experiencing problems.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
>> hemasundara.rao@travelcentrictechnology.com>:
>>
>>> Hi  Ilya Kasnacheev,
>>>
>>>  Did you get a chance to go though the log attached?
>>> This is one of the critical issue we are facing in our dev environment.
>>> Your input is of great help for us if we get, what is causing this issue
>>> and a probable solution to it.
>>>
>>> Thanks and Regards,
>>> Hemasundar.
>>>
>>> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
>>> hemasundara.rao@travelcentrictechnology.com> wrote:
>>>
>>>> Hi  Ilya Kasnacheev,
>>>>   I have attached the log file.
>>>>
>>>> Regards,
>>>> Hemasundar.
>>>>
>>>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <
>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Maybe you have some data in your caches which causes runaway heap
>>>>> usage in your own code. Previously you did not have such data or code which
>>>>> would react in such fashion.
>>>>>
>>>>> It's hard to say, can you provide more logs from the node before it
>>>>> segments?
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>
>>>>>> Thank you very much Ilya Kasnacheev for your response.
>>>>>>
>>>>>> We are loading data initially, after that only small delta change
>>>>>> will be updated.
>>>>>> Grid down issue is happening after it is running successfully 2 to 3
>>>>>> days.
>>>>>> Once the issue started, it is repeating frequently and not getting
>>>>>> any clue.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Hemasundar.
>>>>>>
>>>>>>
>>>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
>>>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>>>>> response from that node. This usually means either network problems or long
>>>>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>>>>
>>>>>>> Make sure your data load process does not cause heap usage spikes.
>>>>>>>
>>>>>>> Regards.
>>>>>>> --
>>>>>>> Ilya Kasnacheev
>>>>>>>
>>>>>>>
>>>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>> We are running two node ignite server cluster.
>>>>>>>> It was running without any issue for almost 5 days. We are using
>>>>>>>> this grid for static data. Ignite process is running with around 8GB memory
>>>>>>>> after we load our data.
>>>>>>>> Suddenly grid server nodes going down , we tried 3 times running
>>>>>>>> the server nodes and loading static data. Those server node going down
>>>>>>>> again and again.
>>>>>>>>
>>>>>>>> Please let us know how to overcome these kind of issue.
>>>>>>>>
>>>>>>>> Attache the log file and configuration file.
>>>>>>>>
>>>>>>>> *Following Is the part of log from on server : *
>>>>>>>>
>>>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>>>>>> isClient=false]
>>>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>> 10.201.30.64:36695, rmtPort=36695
>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>> 10.201.30.172:58418, rmtPort=58418
>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>> 10.201.10.125:63403, rmtPort=63403
>>>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>>>>>> time=49ms]
>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>>>>>> time=7ms]
>>>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>>> 10.201.30.64:48038, rmtPort=48038
>>>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Restarting JVM according to configured segmentation policy.
>>>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>>>>>> heap=84.0GB]
>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Data Regions Configured:
>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>> persistenceEnabled=false]
>>>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>>>>> allowMerge=true]
>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>> Finished waiting for partition release future
>>>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>>>>>> futInfo=NA]
>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>>>>> [topVer=680, minorTopVer=0]]
>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>>> 10.201.30.172:0], discPort=0, order=98, intOrder=53,
>>>>>>>> lastExchangeTime=1542348596592, loc=false,
>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>>>>>> heap=78.0GB]
>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Data Regions Configured:
>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>> persistenceEnabled=false]
>>>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>>> 10.201.30.172:0], discPort=0, order=129, intOrder=71,
>>>>>>>> lastExchangeTime=1542360580600, loc=false,
>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>>>>>> heap=72.0GB]
>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Data Regions Configured:
>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>>> persistenceEnabled=false]
>>>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/
>>>>>>>> 10.201.50.96:0], discPort=0, order=175, intOrder=96,
>>>>>>>> lastExchangeTime=1542365246419, loc=false,
>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>>>>>> heap=71.0GB]
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>>>>
>>>>>>>> [image: HotelHub-logo]
>>>>>>>> HotelHub LLP
>>>>>>>> Phone: +91 80 6741 8700
>>>>>>>> Cell: +91 99 4807 7054
>>>>>>>> Email: hemasundara.rao@hotelhub.com
>>>>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>> HotelHub LLP is a service provider working on behalf of Travel
>>>>>>>> Centric Technology Ltd, a company registered in the United Kingdom.
>>>>>>>> DISCLAIMER: This email message and all attachments are confidential
>>>>>>>> and may contain information that is Privileged, Confidential or exempt from
>>>>>>>> disclosure under applicable law. If you are not the intended recipient, you
>>>>>>>> are notified that any dissemination, distribution or copying of this email
>>>>>>>> is strictly prohibited. If you have received this email in error, please
>>>>>>>> notify us immediately by return email to
>>>>>>>> notices@travelcentrictechnology.com and destroy the original
>>>>>>>> message. Opinions, conclusions and other information in this message that
>>>>>>>> do not relate to the official business of Travel Centric Technology Ltd or
>>>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>>>>> either company.
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>

-- 
Hemasundara Rao Pottangi  | Senior Project Leader

[image: HotelHub-logo]
HotelHub LLP
Phone: +91 80 6741 8700
Cell: +91 99 4807 7054
Email: hemasundara.rao@hotelhub.com
Website: www.hotelhub.com <http://hotelhub.com/>
------------------------------

HotelHub LLP is a service provider working on behalf of Travel Centric
Technology Ltd, a company registered in the United Kingdom.
DISCLAIMER: This email message and all attachments are confidential and may
contain information that is Privileged, Confidential or exempt from
disclosure under applicable law. If you are not the intended recipient, you
are notified that any dissemination, distribution or copying of this email
is strictly prohibited. If you have received this email in error, please
notify us immediately by return email to
notices@travelcentrictechnology.com and
destroy the original message. Opinions, conclusions and other information
in this message that do not relate to the official business of Travel
Centric Technology Ltd or HotelHub LLP, shall be understood to be neither
given nor endorsed by either company.

Re: Ignite cluster going down frequently

Posted by Hemasundara Rao <he...@travelcentrictechnology.com>.
Hi Ilya Kasnacheev,

  We are running one cluster node (10.201.30.63). I am attaching all logs
from this server.
Please let me know if you need any other details.

Thanks and Regards,
Hemasundar.


On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> It is not clear from this log alone why this node became segmented. Do you
> have log from other server node in the topology? It was coordinator so
> maybe it was the one experiencing problems.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
> hemasundara.rao@travelcentrictechnology.com>:
>
>> Hi  Ilya Kasnacheev,
>>
>>  Did you get a chance to go though the log attached?
>> This is one of the critical issue we are facing in our dev environment.
>> Your input is of great help for us if we get, what is causing this issue
>> and a probable solution to it.
>>
>> Thanks and Regards,
>> Hemasundar.
>>
>> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
>> hemasundara.rao@travelcentrictechnology.com> wrote:
>>
>>> Hi  Ilya Kasnacheev,
>>>   I have attached the log file.
>>>
>>> Regards,
>>> Hemasundar.
>>>
>>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <il...@gmail.com>
>>> wrote:
>>>
>>>> Hello!
>>>>
>>>> Maybe you have some data in your caches which causes runaway heap usage
>>>> in your own code. Previously you did not have such data or code which would
>>>> react in such fashion.
>>>>
>>>> It's hard to say, can you provide more logs from the node before it
>>>> segments?
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>
>>>>> Thank you very much Ilya Kasnacheev for your response.
>>>>>
>>>>> We are loading data initially, after that only small delta change will
>>>>> be updated.
>>>>> Grid down issue is happening after it is running successfully 2 to 3
>>>>> days.
>>>>> Once the issue started, it is repeating frequently and not getting any
>>>>> clue.
>>>>>
>>>>> Thanks and Regards,
>>>>> Hemasundar.
>>>>>
>>>>>
>>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
>>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>>>> response from that node. This usually means either network problems or long
>>>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>>>
>>>>>> Make sure your data load process does not cause heap usage spikes.
>>>>>>
>>>>>> Regards.
>>>>>> --
>>>>>> Ilya Kasnacheev
>>>>>>
>>>>>>
>>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>>
>>>>>>> Hi All,
>>>>>>> We are running two node ignite server cluster.
>>>>>>> It was running without any issue for almost 5 days. We are using
>>>>>>> this grid for static data. Ignite process is running with around 8GB memory
>>>>>>> after we load our data.
>>>>>>> Suddenly grid server nodes going down , we tried 3 times running the
>>>>>>> server nodes and loading static data. Those server node going down again
>>>>>>> and again.
>>>>>>>
>>>>>>> Please let us know how to overcome these kind of issue.
>>>>>>>
>>>>>>> Attache the log file and configuration file.
>>>>>>>
>>>>>>> *Following Is the part of log from on server : *
>>>>>>>
>>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>>>>> isClient=false]
>>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>>>>>> rmtPort=36695
>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>> 10.201.30.172:58418, rmtPort=58418
>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished serving remote node connection [rmtAddr=/
>>>>>>> 10.201.10.125:63403, rmtPort=63403
>>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>>>>> time=49ms]
>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>>>>> time=7ms]
>>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>>>>>> rmtPort=48038
>>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Restarting JVM according to configured segmentation policy.
>>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>>>>> heap=84.0GB]
>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Data Regions Configured:
>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>> persistenceEnabled=false]
>>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>>>> allowMerge=true]
>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>> Finished waiting for partition release future
>>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>>>>> futInfo=NA]
>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>>>> [topVer=680, minorTopVer=0]]
>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>> 10.201.30.172:0], discPort=0, order=98, intOrder=53,
>>>>>>> lastExchangeTime=1542348596592, loc=false,
>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>>>>> heap=78.0GB]
>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Data Regions Configured:
>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>> persistenceEnabled=false]
>>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/
>>>>>>> 10.201.30.172:0], discPort=0, order=129, intOrder=71,
>>>>>>> lastExchangeTime=1542360580600, loc=false,
>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>>>>> heap=72.0GB]
>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Data Regions Configured:
>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>>> persistenceEnabled=false]
>>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/
>>>>>>> 10.201.50.96:0], discPort=0, order=175, intOrder=96,
>>>>>>> lastExchangeTime=1542365246419, loc=false,
>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>>>>> heap=71.0GB]
>>>>>>>
>>>>>>> --
>>>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>>>
>>>>>>> [image: HotelHub-logo]
>>>>>>> HotelHub LLP
>>>>>>> Phone: +91 80 6741 8700
>>>>>>> Cell: +91 99 4807 7054
>>>>>>> Email: hemasundara.rao@hotelhub.com
>>>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> HotelHub LLP is a service provider working on behalf of Travel
>>>>>>> Centric Technology Ltd, a company registered in the United Kingdom.
>>>>>>> DISCLAIMER: This email message and all attachments are confidential
>>>>>>> and may contain information that is Privileged, Confidential or exempt from
>>>>>>> disclosure under applicable law. If you are not the intended recipient, you
>>>>>>> are notified that any dissemination, distribution or copying of this email
>>>>>>> is strictly prohibited. If you have received this email in error, please
>>>>>>> notify us immediately by return email to
>>>>>>> notices@travelcentrictechnology.com and destroy the original
>>>>>>> message. Opinions, conclusions and other information in this message that
>>>>>>> do not relate to the official business of Travel Centric Technology Ltd or
>>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>>>> either company.
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>

Re: Ignite cluster going down frequently

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

It is not clear from this log alone why this node became segmented. Do you
have log from other server node in the topology? It was coordinator so
maybe it was the one experiencing problems.

Regards,
-- 
Ilya Kasnacheev


ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com>:

> Hi  Ilya Kasnacheev,
>
>  Did you get a chance to go though the log attached?
> This is one of the critical issue we are facing in our dev environment.
> Your input is of great help for us if we get, what is causing this issue
> and a probable solution to it.
>
> Thanks and Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
> hemasundara.rao@travelcentrictechnology.com> wrote:
>
>> Hi  Ilya Kasnacheev,
>>   I have attached the log file.
>>
>> Regards,
>> Hemasundar.
>>
>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <il...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>
>>> Maybe you have some data in your caches which causes runaway heap usage
>>> in your own code. Previously you did not have such data or code which would
>>> react in such fashion.
>>>
>>> It's hard to say, can you provide more logs from the node before it
>>> segments?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>> hemasundara.rao@travelcentrictechnology.com>:
>>>
>>>> Thank you very much Ilya Kasnacheev for your response.
>>>>
>>>> We are loading data initially, after that only small delta change will
>>>> be updated.
>>>> Grid down issue is happening after it is running successfully 2 to 3
>>>> days.
>>>> Once the issue started, it is repeating frequently and not getting any
>>>> clue.
>>>>
>>>> Thanks and Regards,
>>>> Hemasundar.
>>>>
>>>>
>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
>>>> ilya.kasnacheev@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>>> response from that node. This usually means either network problems or long
>>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>>
>>>>> Make sure your data load process does not cause heap usage spikes.
>>>>>
>>>>> Regards.
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>>
>>>>>> Hi All,
>>>>>> We are running two node ignite server cluster.
>>>>>> It was running without any issue for almost 5 days. We are using this
>>>>>> grid for static data. Ignite process is running with around 8GB memory
>>>>>> after we load our data.
>>>>>> Suddenly grid server nodes going down , we tried 3 times running the
>>>>>> server nodes and loading static data. Those server node going down again
>>>>>> and again.
>>>>>>
>>>>>> Please let us know how to overcome these kind of issue.
>>>>>>
>>>>>> Attache the log file and configuration file.
>>>>>>
>>>>>> *Following Is the part of log from on server : *
>>>>>>
>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>>>> isClient=false]
>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>>>>> rmtPort=36695
>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>>>>> rmtPort=58418
>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>>>>> rmtPort=63403
>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>>>> time=49ms]
>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>>>> time=7ms]
>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>>>>> rmtPort=48038
>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Restarting JVM according to configured segmentation policy.
>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>>>> heap=84.0GB]
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>>> allowMerge=true]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> Finished waiting for partition release future
>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>>>> futInfo=NA]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>>> [topVer=680, minorTopVer=0]]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>>>> heap=78.0GB]
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>>>> heap=72.0GB]
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>>>>>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>>>> heap=71.0GB]
>>>>>>
>>>>>> --
>>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>>
>>>>>> [image: HotelHub-logo]
>>>>>> HotelHub LLP
>>>>>> Phone: +91 80 6741 8700
>>>>>> Cell: +91 99 4807 7054
>>>>>> Email: hemasundara.rao@hotelhub.com
>>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>>> ------------------------------
>>>>>>
>>>>>> HotelHub LLP is a service provider working on behalf of Travel
>>>>>> Centric Technology Ltd, a company registered in the United Kingdom.
>>>>>> DISCLAIMER: This email message and all attachments are confidential
>>>>>> and may contain information that is Privileged, Confidential or exempt from
>>>>>> disclosure under applicable law. If you are not the intended recipient, you
>>>>>> are notified that any dissemination, distribution or copying of this email
>>>>>> is strictly prohibited. If you have received this email in error, please
>>>>>> notify us immediately by return email to
>>>>>> notices@travelcentrictechnology.com and destroy the original
>>>>>> message. Opinions, conclusions and other information in this message that
>>>>>> do not relate to the official business of Travel Centric Technology Ltd or
>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>>> either company.
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Re: Ignite cluster going down frequently

Posted by Hemasundara Rao <he...@travelcentrictechnology.com>.
Hi  Ilya Kasnacheev,

 Did you get a chance to go though the log attached?
This is one of the critical issue we are facing in our dev environment.
Your input is of great help for us if we get, what is causing this issue
and a probable solution to it.

Thanks and Regards,
Hemasundar.

On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com> wrote:

> Hi  Ilya Kasnacheev,
>   I have attached the log file.
>
> Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Maybe you have some data in your caches which causes runaway heap usage
>> in your own code. Previously you did not have such data or code which would
>> react in such fashion.
>>
>> It's hard to say, can you provide more logs from the node before it
>> segments?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>> hemasundara.rao@travelcentrictechnology.com>:
>>
>>> Thank you very much Ilya Kasnacheev for your response.
>>>
>>> We are loading data initially, after that only small delta change will
>>> be updated.
>>> Grid down issue is happening after it is running successfully 2 to 3
>>> days.
>>> Once the issue started, it is repeating frequently and not getting any
>>> clue.
>>>
>>> Thanks and Regards,
>>> Hemasundar.
>>>
>>>
>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <il...@gmail.com>
>>> wrote:
>>>
>>>> Hello!
>>>>
>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>> response from that node. This usually means either network problems or long
>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>
>>>> Make sure your data load process does not cause heap usage spikes.
>>>>
>>>> Regards.
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>> hemasundara.rao@travelcentrictechnology.com>:
>>>>
>>>>> Hi All,
>>>>> We are running two node ignite server cluster.
>>>>> It was running without any issue for almost 5 days. We are using this
>>>>> grid for static data. Ignite process is running with around 8GB memory
>>>>> after we load our data.
>>>>> Suddenly grid server nodes going down , we tried 3 times running the
>>>>> server nodes and loading static data. Those server node going down again
>>>>> and again.
>>>>>
>>>>> Please let us know how to overcome these kind of issue.
>>>>>
>>>>> Attache the log file and configuration file.
>>>>>
>>>>> *Following Is the part of log from on server : *
>>>>>
>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>>> isClient=false]
>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>>>> rmtPort=36695
>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>>>> rmtPort=58418
>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>>>> rmtPort=63403
>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>>> time=49ms]
>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>>> time=7ms]
>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>>>> rmtPort=48038
>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Restarting JVM according to configured segmentation policy.
>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>>> heap=84.0GB]
>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Data Regions Configured:
>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>> persistenceEnabled=false]
>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>> allowMerge=true]
>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>> Finished waiting for partition release future
>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>>> futInfo=NA]
>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>> [topVer=680, minorTopVer=0]]
>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>>> heap=78.0GB]
>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Data Regions Configured:
>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>> persistenceEnabled=false]
>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>>> heap=72.0GB]
>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Data Regions Configured:
>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>> persistenceEnabled=false]
>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>>>>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>>> heap=71.0GB]
>>>>>
>>>>> --
>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>
>>>>> [image: HotelHub-logo]
>>>>> HotelHub LLP
>>>>> Phone: +91 80 6741 8700
>>>>> Cell: +91 99 4807 7054
>>>>> Email: hemasundara.rao@hotelhub.com
>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>> ------------------------------
>>>>>
>>>>> HotelHub LLP is a service provider working on behalf of Travel Centric
>>>>> Technology Ltd, a company registered in the United Kingdom.
>>>>> DISCLAIMER: This email message and all attachments are confidential
>>>>> and may contain information that is Privileged, Confidential or exempt from
>>>>> disclosure under applicable law. If you are not the intended recipient, you
>>>>> are notified that any dissemination, distribution or copying of this email
>>>>> is strictly prohibited. If you have received this email in error, please
>>>>> notify us immediately by return email to
>>>>> notices@travelcentrictechnology.com and destroy the original message.
>>>>> Opinions, conclusions and other information in this message that do not
>>>>> relate to the official business of Travel Centric Technology Ltd or
>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>> either company.
>>>>>
>>>>>
>>>
>>>
>
>

Re: Ignite cluster going down frequently

Posted by Hemasundara Rao <he...@travelcentrictechnology.com>.
Hi  Ilya Kasnacheev,
  I have attached the log file.

Regards,
Hemasundar.

On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> Maybe you have some data in your caches which causes runaway heap usage in
> your own code. Previously you did not have such data or code which would
> react in such fashion.
>
> It's hard to say, can you provide more logs from the node before it
> segments?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
> hemasundara.rao@travelcentrictechnology.com>:
>
>> Thank you very much Ilya Kasnacheev for your response.
>>
>> We are loading data initially, after that only small delta change will be
>> updated.
>> Grid down issue is happening after it is running successfully 2 to 3 days.
>> Once the issue started, it is repeating frequently and not getting any
>> clue.
>>
>> Thanks and Regards,
>> Hemasundar.
>>
>>
>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <il...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>
>>> Node will get segmented if other nodes fail to wait for Discovery
>>> response from that node. This usually means either network problems or long
>>> GC pauses causes by insufficient heap on one of nodes.
>>>
>>> Make sure your data load process does not cause heap usage spikes.
>>>
>>> Regards.
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>> hemasundara.rao@travelcentrictechnology.com>:
>>>
>>>> Hi All,
>>>> We are running two node ignite server cluster.
>>>> It was running without any issue for almost 5 days. We are using this
>>>> grid for static data. Ignite process is running with around 8GB memory
>>>> after we load our data.
>>>> Suddenly grid server nodes going down , we tried 3 times running the
>>>> server nodes and loading static data. Those server node going down again
>>>> and again.
>>>>
>>>> Please let us know how to overcome these kind of issue.
>>>>
>>>> Attache the log file and configuration file.
>>>>
>>>> *Following Is the part of log from on server : *
>>>>
>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Node is out of topology (probably, due to short-time network problems).
>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>>> isClient=false]
>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>>> rmtPort=36695
>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>>> rmtPort=58418
>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>>> rmtPort=63403
>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>>> time=49ms]
>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>>> time=7ms]
>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>>> rmtPort=48038
>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Restarting JVM according to configured segmentation policy.
>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>>>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>>> heap=84.0GB]
>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Data Regions Configured:
>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>> persistenceEnabled=false]
>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>> allowMerge=true]
>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>> Finished waiting for partition release future
>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>>> futInfo=NA]
>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>> [topVer=680, minorTopVer=0]]
>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>>>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>>> heap=78.0GB]
>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Data Regions Configured:
>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>> persistenceEnabled=false]
>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>>> heap=72.0GB]
>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Data Regions Configured:
>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>> persistenceEnabled=false]
>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>>>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>>> heap=71.0GB]
>>>>
>>>> --
>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>
>>>> [image: HotelHub-logo]
>>>> HotelHub LLP
>>>> Phone: +91 80 6741 8700
>>>> Cell: +91 99 4807 7054
>>>> Email: hemasundara.rao@hotelhub.com
>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>> ------------------------------
>>>>
>>>> HotelHub LLP is a service provider working on behalf of Travel Centric
>>>> Technology Ltd, a company registered in the United Kingdom.
>>>> DISCLAIMER: This email message and all attachments are confidential and
>>>> may contain information that is Privileged, Confidential or exempt from
>>>> disclosure under applicable law. If you are not the intended recipient, you
>>>> are notified that any dissemination, distribution or copying of this email
>>>> is strictly prohibited. If you have received this email in error, please
>>>> notify us immediately by return email to
>>>> notices@travelcentrictechnology.com and destroy the original message.
>>>> Opinions, conclusions and other information in this message that do not
>>>> relate to the official business of Travel Centric Technology Ltd or
>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>> either company.
>>>>
>>>>
>>
>>

Re: Ignite cluster going down frequently

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Maybe you have some data in your caches which causes runaway heap usage in
your own code. Previously you did not have such data or code which would
react in such fashion.

It's hard to say, can you provide more logs from the node before it
segments?

Regards,
-- 
Ilya Kasnacheev


пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com>:

> Thank you very much Ilya Kasnacheev for your response.
>
> We are loading data initially, after that only small delta change will be
> updated.
> Grid down issue is happening after it is running successfully 2 to 3 days.
> Once the issue started, it is repeating frequently and not getting any
> clue.
>
> Thanks and Regards,
> Hemasundar.
>
>
> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Node will get segmented if other nodes fail to wait for Discovery
>> response from that node. This usually means either network problems or long
>> GC pauses causes by insufficient heap on one of nodes.
>>
>> Make sure your data load process does not cause heap usage spikes.
>>
>> Regards.
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>> hemasundara.rao@travelcentrictechnology.com>:
>>
>>> Hi All,
>>> We are running two node ignite server cluster.
>>> It was running without any issue for almost 5 days. We are using this
>>> grid for static data. Ignite process is running with around 8GB memory
>>> after we load our data.
>>> Suddenly grid server nodes going down , we tried 3 times running the
>>> server nodes and loading static data. Those server node going down again
>>> and again.
>>>
>>> Please let us know how to overcome these kind of issue.
>>>
>>> Attache the log file and configuration file.
>>>
>>> *Following Is the part of log from on server : *
>>>
>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Node is out of topology (probably, due to short-time network problems).
>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Local node SEGMENTED: TcpDiscoveryNode
>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>>> isClient=false]
>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>> rmtPort=36695
>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>> rmtPort=58418
>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>> rmtPort=63403
>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>>> time=49ms]
>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>>> time=7ms]
>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>> rmtPort=48038
>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Restarting JVM according to configured segmentation policy.
>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>>> heap=84.0GB]
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Data Regions Configured:
>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>> persistenceEnabled=false]
>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>> allowMerge=true]
>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>> Finished waiting for partition release future
>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>>> futInfo=NA]
>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>> [topVer=680, minorTopVer=0]]
>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>>> heap=78.0GB]
>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Data Regions Configured:
>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>> persistenceEnabled=false]
>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>>> heap=72.0GB]
>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Data Regions Configured:
>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>> persistenceEnabled=false]
>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>>> heap=71.0GB]
>>>
>>> --
>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>
>>> [image: HotelHub-logo]
>>> HotelHub LLP
>>> Phone: +91 80 6741 8700
>>> Cell: +91 99 4807 7054
>>> Email: hemasundara.rao@hotelhub.com
>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>> ------------------------------
>>>
>>> HotelHub LLP is a service provider working on behalf of Travel Centric
>>> Technology Ltd, a company registered in the United Kingdom.
>>> DISCLAIMER: This email message and all attachments are confidential and
>>> may contain information that is Privileged, Confidential or exempt from
>>> disclosure under applicable law. If you are not the intended recipient, you
>>> are notified that any dissemination, distribution or copying of this email
>>> is strictly prohibited. If you have received this email in error, please
>>> notify us immediately by return email to
>>> notices@travelcentrictechnology.com and destroy the original message.
>>> Opinions, conclusions and other information in this message that do not
>>> relate to the official business of Travel Centric Technology Ltd or
>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>> either company.
>>>
>>>
>
>

Re: Ignite cluster going down frequently

Posted by Hemasundara Rao <he...@travelcentrictechnology.com>.
Thank you very much Ilya Kasnacheev for your response.

We are loading data initially, after that only small delta change will be
updated.
Grid down issue is happening after it is running successfully 2 to 3 days.
Once the issue started, it is repeating frequently and not getting any clue.

Thanks and Regards,
Hemasundar.


On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> Node will get segmented if other nodes fail to wait for Discovery response
> from that node. This usually means either network problems or long GC
> pauses causes by insufficient heap on one of nodes.
>
> Make sure your data load process does not cause heap usage spikes.
>
> Regards.
> --
> Ilya Kasnacheev
>
>
> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
> hemasundara.rao@travelcentrictechnology.com>:
>
>> Hi All,
>> We are running two node ignite server cluster.
>> It was running without any issue for almost 5 days. We are using this
>> grid for static data. Ignite process is running with around 8GB memory
>> after we load our data.
>> Suddenly grid server nodes going down , we tried 3 times running the
>> server nodes and loading static data. Those server node going down again
>> and again.
>>
>> Please let us know how to overcome these kind of issue.
>>
>> Attache the log file and configuration file.
>>
>> *Following Is the part of log from on server : *
>>
>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Node is out of topology (probably, due to short-time network problems).
>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Local node SEGMENTED: TcpDiscoveryNode
>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
>> isClient=false]
>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>> rmtPort=36695
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>> rmtPort=58418
>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>> rmtPort=63403
>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
>> time=49ms]
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
>> time=7ms]
>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>> rmtPort=48038
>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Restarting JVM according to configured segmentation policy.
>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
>> heap=84.0GB]
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Data Regions Configured:
>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>> persistenceEnabled=false]
>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>> allowMerge=true]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>> Finished waiting for partition release future
>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
>> futInfo=NA]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>> [topVer=680, minorTopVer=0]]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
>> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
>> ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
>> heap=78.0GB]
>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Data Regions Configured:
>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>> persistenceEnabled=false]
>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
>> heap=72.0GB]
>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Data Regions Configured:
>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>> persistenceEnabled=false]
>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
>> heap=71.0GB]
>>
>> --
>> Hemasundara Rao Pottangi  | Senior Project Leader
>>
>> [image: HotelHub-logo]
>> HotelHub LLP
>> Phone: +91 80 6741 8700
>> Cell: +91 99 4807 7054
>> Email: hemasundara.rao@hotelhub.com
>> Website: www.hotelhub.com <http://hotelhub.com/>
>> ------------------------------
>>
>> HotelHub LLP is a service provider working on behalf of Travel Centric
>> Technology Ltd, a company registered in the United Kingdom.
>> DISCLAIMER: This email message and all attachments are confidential and
>> may contain information that is Privileged, Confidential or exempt from
>> disclosure under applicable law. If you are not the intended recipient, you
>> are notified that any dissemination, distribution or copying of this email
>> is strictly prohibited. If you have received this email in error, please
>> notify us immediately by return email to
>> notices@travelcentrictechnology.com and destroy the original message.
>> Opinions, conclusions and other information in this message that do not
>> relate to the official business of Travel Centric Technology Ltd or
>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>> either company.
>>
>>

Re: Ignite cluster going down frequently

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Node will get segmented if other nodes fail to wait for Discovery response
from that node. This usually means either network problems or long GC
pauses causes by insufficient heap on one of nodes.

Make sure your data load process does not cause heap usage spikes.

Regards.
-- 
Ilya Kasnacheev


пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com>:

> Hi All,
> We are running two node ignite server cluster.
> It was running without any issue for almost 5 days. We are using this grid
> for static data. Ignite process is running with around 8GB memory after we
> load our data.
> Suddenly grid server nodes going down , we tried 3 times running the
> server nodes and loading static data. Those server node going down again
> and again.
>
> Please let us know how to overcome these kind of issue.
>
> Attache the log file and configuration file.
>
> *Following Is the part of log from on server : *
>
> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Node is out of topology (probably, due to short-time network problems).
> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Local node SEGMENTED: TcpDiscoveryNode
> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/
> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
> lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270,
> isClient=false]
> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
> rmtPort=36695
> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
> rmtPort=58418
> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
> rmtPort=63403
> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, res=true,
> time=49ms]
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, res=false,
> time=7ms]
> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
> rmtPort=48038
> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Restarting JVM according to configured segmentation policy.
> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false]
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, offheap=8.0GB,
> heap=84.0GB]
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Data Regions Configured:
> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
> persistenceEnabled=false]
> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] Started
> exchange init [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0],
> crd=true, evt=NODE_FAILED, evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> customEvt=null, allowMerge=true]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> Finished waiting for partition release future
> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], waitTime=0ms,
> futInfo=NA]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
> [topVer=680, minorTopVer=0]]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], discPort=47601,
> order=41, intOrder=22, lastExchangeTime=1542262724642, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, minorTopVer=0]]
> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, offheap=8.0GB,
> heap=78.0GB]
> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Data Regions Configured:
> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
> persistenceEnabled=false]
> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, offheap=8.0GB,
> heap=72.0GB]
> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Data Regions Configured:
> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
> persistenceEnabled=false]
> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, offheap=8.0GB,
> heap=71.0GB]
>
> --
> Hemasundara Rao Pottangi  | Senior Project Leader
>
> [image: HotelHub-logo]
> HotelHub LLP
> Phone: +91 80 6741 8700
> Cell: +91 99 4807 7054
> Email: hemasundara.rao@hotelhub.com
> Website: www.hotelhub.com <http://hotelhub.com/>
> ------------------------------
>
> HotelHub LLP is a service provider working on behalf of Travel Centric
> Technology Ltd, a company registered in the United Kingdom.
> DISCLAIMER: This email message and all attachments are confidential and
> may contain information that is Privileged, Confidential or exempt from
> disclosure under applicable law. If you are not the intended recipient, you
> are notified that any dissemination, distribution or copying of this email
> is strictly prohibited. If you have received this email in error, please
> notify us immediately by return email to
> notices@travelcentrictechnology.com and destroy the original message.
> Opinions, conclusions and other information in this message that do not
> relate to the official business of Travel Centric Technology Ltd or
> HotelHub LLP, shall be understood to be neither given nor endorsed by
> either company.
>
>