You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Hemasundara Rao <he...@travelcentrictechnology.com> on 2019/03/05 06:07:47 UTC

Ignite Server critical failure and jvm restart according to segmentation policy

Hi,
We are facing Ignite server restarting multiple times with the following
errors, this is causing major problems in our environment

[06:51:03,106][SEVERE][disco-event-worker-#42%StaticGrid27_CommonDev%][FailureProcessor]
Ignite node is in invalid state due to a critical failure.
[06:51:03,107][SEVERE][node-restarter][] Restarting JVM on Ignite failure:
[failureCtx=FailureContext [type=SEGMENTATION, err=null]]

We are unable to identify what is the reason for this critical failure.
Please let us know how to overcome this critical failure.

We are using two node cluster and I am attaching logs from both servers.


Thanks and Regards,
Hemasundara Rao Pottangi  | Senior Project Leader

[image: HotelHub-logo]
HotelHub LLP
Phone: +91 80 6741 8700
Cell: +91 99 4807 7054
Email: hemasundara.rao@hotelhub.com
Website: www.hotelhub.com <http://hotelhub.com/>
------------------------------

HotelHub LLP is a service provider working on behalf of Travel Centric
Technology Ltd, a company registered in the United Kingdom.
DISCLAIMER: This email message and all attachments are confidential and may
contain information that is Privileged, Confidential or exempt from
disclosure under applicable law. If you are not the intended recipient, you
are notified that any dissemination, distribution or copying of this email
is strictly prohibited. If you have received this email in error, please
notify us immediately by return email to
notices@travelcentrictechnology.com and
destroy the original message. Opinions, conclusions and other information
in this message that do not relate to the official business of Travel
Centric Technology Ltd or HotelHub LLP, shall be understood to be neither
given nor endorsed by either company.

Re: Ignite Server critical failure and jvm restart according to segmentation policy

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

[12:23:16,734][INFO][grid-nio-worker-tcp-comm-3-#27%StaticGrid27_CommonDev%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.201.30.63:9600,
rmtAddr=/10.201.50.40:53366]
[12:24:00,259][INFO][tcp-disco-sock-reader-#368%StaticGrid27_CommonDev%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/10.201.30.64:33763,
rmtPort=33763
[12:24:00,308][INFO][tcp-disco-srvr-#3%StaticGrid27_CommonDev%][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.201.30.64,
rmtPort=52085]
[12:24:00,308][INFO][tcp-disco-srvr-#3%StaticGrid27_CommonDev%][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.201.30.64,
rmtPort=52085]
[12:24:00,308][INFO][tcp-disco-sock-reader-#373%StaticGrid27_CommonDev%][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/10.201.30.64:52085,
rmtPort=52085]
[12:24:00,316][WARNING][tcp-disco-msg-worker-#2%StaticGrid27_CommonDev%][TcpDiscoverySpi]
Node is out of topology (probably, due to short-time network problems).
[12:24:00,317][WARNING][disco-event-worker-#42%StaticGrid27_CommonDev%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=087c1178-2fb5-428f-9cf2-03c0ea1b996d, addrs=[10.201.30.63], sockAddrs=[/
10.201.30.63:9200], discPort=9200, order=906, intOrder=469,
lastExchangeTime=1551529440310, loc=true, ver=2.7.0#20181130-sha1:256ae401,
isClient=false]

I imagine your node had suffered long GC and it was segmented from cluster.
You can:

- Try to understand whether you have heap usage spikes to cause segmenting.
- Decrease amount of heap (times two?) and see if situation improves.
- Increase failureDetectionTimeout setting to 120000 (two minutes).

Regards,
-- 
Ilya Kasnacheev


вт, 5 мар. 2019 г. в 09:10, Hemasundara Rao <
hemasundara.rao@travelcentrictechnology.com>:

> Hi,
> We are facing Ignite server restarting multiple times with the following
> errors, this is causing major problems in our environment
>
> [06:51:03,106][SEVERE][disco-event-worker-#42%StaticGrid27_CommonDev%][FailureProcessor]
> Ignite node is in invalid state due to a critical failure.
> [06:51:03,107][SEVERE][node-restarter][] Restarting JVM on Ignite failure:
> [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
>
> We are unable to identify what is the reason for this critical failure.
> Please let us know how to overcome this critical failure.
>
> We are using two node cluster and I am attaching logs from both servers.
>
>
> Thanks and Regards,
> Hemasundara Rao Pottangi  | Senior Project Leader
>
> [image: HotelHub-logo]
> HotelHub LLP
> Phone: +91 80 6741 8700
> Cell: +91 99 4807 7054
> Email: hemasundara.rao@hotelhub.com
> Website: www.hotelhub.com <http://hotelhub.com/>
> ------------------------------
>
> HotelHub LLP is a service provider working on behalf of Travel Centric
> Technology Ltd, a company registered in the United Kingdom.
> DISCLAIMER: This email message and all attachments are confidential and
> may contain information that is Privileged, Confidential or exempt from
> disclosure under applicable law. If you are not the intended recipient, you
> are notified that any dissemination, distribution or copying of this email
> is strictly prohibited. If you have received this email in error, please
> notify us immediately by return email to
> notices@travelcentrictechnology.com and destroy the original message.
> Opinions, conclusions and other information in this message that do not
> relate to the official business of Travel Centric Technology Ltd or
> HotelHub LLP, shall be understood to be neither given nor endorsed by
> either company.
>
>