You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Akash Shinde <ak...@gmail.com> on 2019/11/27 11:29:40 UTC

Local node terminated after segmentation

Hi ,

I have started four server nodes. One of the node got terminated
unexpectedly giving following error. Before terminating the JVM the node
was segmented.

1) Does ignite always treat node segmentation as "Critical system error"
and use "StopNodeOrHaltFailureHandler" to take required action which
"Teminate Node" in this case?

2) Are there any other reasons for   "Critical system error detected"
message?

I have not set the SegmentationPolicy  explicitly.  AFAIK ignite does not
provide SegmentationResolver and SegmentationPolicy out of box.

3) Do I need to implement SegmentationResolver and set the
SegmenetationPolicy to "STOP" if I want to stop the JVM if the node is
segmented?

4) I am starting Ignite in embedded mode. When a node is segmented  I want
restart the JVM. I
Is there any way to do this? (I am not using ignite.sh/ignite.bat) to start
the ignite.

Please find attached logs.

Exception:












*2019-11-27 08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%]
WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED:
TcpDiscoveryNode [id=b4fce076-cc7a-47ee-98fd-31e1d610b5de,
addrs=[10.45.65.97, 127.0.0.1], sockAddrs=[/10.45.65.97:47500
<http://10.45.65.97:47500>, /127.0.0.1:47500 <http://127.0.0.1:47500>],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1574843446983,
loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=false]2019-11-27
08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%] WARN
 o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: TcpDiscoveryNode
[id=b4fce076-cc7a-47ee-98fd-31e1d610b5de, addrs=[10.45.65.97, 127.0.0.1],
sockAddrs=[/10.45.65.97:47500 <http://10.45.65.97:47500>, /127.0.0.1:47500
<http://127.0.0.1:47500>], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1574843446983, loc=true, ver=2.6.0#20180710-sha1:669feacc,
isClient=false]2019-11-27 08:30:46,994 9321190
[tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical system error
detected. Will be handled accordingly to configured handler [hnd=class
o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
Thread tcp-disco-srvr-#3%springDataNode% is terminated
unexpectedly.]]java.lang.IllegalStateException: Thread
tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
      at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
08:30:46,994 9321190 [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical
system error detected. Will be handled accordingly to configured handler
[hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
err=java.lang.IllegalStateException: Thread
tcp-disco-srvr-#3%springDataNode% is terminated
unexpectedly.]]java.lang.IllegalStateException: Thread
tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
      at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
08:30:46,995 9321191 [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will
be halted immediately due to the failure: [failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
Thread tcp-disco-srvr-#3%springDataNode% is terminated
unexpectedly.]]2019-11-27 08:30:46,995 9321191
[tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will be halted immediately
due to the failure: [failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]*

Re: Local node terminated after segmentation

Posted by VeenaMithare <v....@cmcmarkets.com>.
Thanks Evgenii 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Local node terminated after segmentation

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Hi Veena,

There is a message in the logs:
 [WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM
pause: 8023 milliseconds.

In most cases, it is a sign of the long GC pause. Of course, this JVM pause
can be related to the problems with a virtual environment or something
else, but usually it's GC. You can collect GC logs to make sure that it's
GC.
If this JVM pause is longer than failureDetectionTimeout, then, node can be
kicked from the cluster.

Evgenii

пн, 13 апр. 2020 г. в 15:32, VeenaMithare <v....@cmcmarkets.com>:

> Hi Ilya,
>
> How can a node reachability resolver or Tcp Segmentation resolver help in
> discovering segmentation due to GC pauses ? What is the best way to
> discover
> segmentation on a node due to GC pauses ?
>
>
>
> regards,
> Veena
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Local node terminated after segmentation

Posted by VeenaMithare <v....@cmcmarkets.com>.
Hi Ilya, 

How can a node reachability resolver or Tcp Segmentation resolver help in
discovering segmentation due to GC pauses ? What is the best way to discover
segmentation on a node due to GC pauses ?



regards,
Veena



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Local node terminated after segmentation

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Personally I've never seen a split brain. We recommend having collocated
clusters, in which case notes will only fail one by one as opposed to
forming a segmented cluster.
But, if you are really concerned with split brain, you can use
ZooKeeper-based discovery, since ZooKeeper has built-in split brain
protection that you can rely on.

Regards,
-- 
Ilya Kasnacheev


вт, 24 дек. 2019 г. в 14:37, Akash Shinde <ak...@gmail.com>:

> Can someone please help me on this?
>
> On Thu, Dec 12, 2019 at 1:11 PM Akash Shinde <ak...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can you please explain on high level how GridGain implementations
>> protects from having  two segments that are alive at the same time which
>> could lead to data inconsistency over time? What exactly does it do to
>> achieve this?
>>
>> Regards,
>> A.
>>
>> On Wed, Dec 11, 2019 at 5:48 PM Stanislav Lukyanov <
>> stanlukyanov@gmail.com> wrote:
>>
>>> In Ignite a node can go into "segmented" state in two cases really: 1. A
>>> node was unavailable (sleeping. hanging in full GC, etc) for a long time 2.
>>> Cluster detected a possible split-brain situation and marked the node as
>>> "segmented".
>>>
>>> Yes, split-brain protection (in GridGain implementation and in theory
>>> too) doesn't protect your node from stopping. It protects you from having
>>> two segments that are alive at the same time which could lead to data
>>> inconsistency over time.
>>>
>>> Regarding Discovery and large clusters. If your cluster is too big for
>>> the ring-based TcpDiscoverySpi to work well then you should use Zookeeper
>>> Discovery which was created specifically to support large clusters.
>>>
>>> Stan
>>>
>>> On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <
>>> prasadbhalerao1983@gmail.com> wrote:
>>>
>>>>
>>>> Can someone please advise on this?
>>>>>
>>>>> ---------- Forwarded message ---------
>>>>> From: Prasad Bhalerao <pr...@gmail.com>
>>>>> Date: Fri, Nov 29, 2019 at 7:53 AM
>>>>> Subject: Re: Local node terminated after segmentation
>>>>> To: <us...@ignite.apache.org>
>>>>>
>>>>>
>>>>> I had checked the resource you mentioned, but I was confused with
>>>>> grid-gain doc  describing it as protection against split-brain. Because if
>>>>> the node is segmented the only thing one can do is stop/restart/noop.
>>>>> I was just wondering how it provides protection against split-brain.
>>>>> Now I think by protection it means kill the segmented node/nodes or
>>>>> restart it and bring it back in the cluster .
>>>>>
>>>>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the
>>>>> ring right to check if the node is reachable or not.
>>>>> So the question in what situation one needs one more ways to check if
>>>>> the node is reachable or not using different resolvers?
>>>>>
>>>>> Please let me know if my understanding is correct.
>>>>>
>>>>> The article you mentioned, I had checked that code. It requires a node
>>>>> to be configured in advance so that resolver can check if that node is
>>>>> reachable from local host. It doesn't not check if all the nodes are
>>>>> reachable from local host.
>>>>>
>>>>> Eg: node1 will check for node2 and node2 will check for node 3 and
>>>>> node 3 will check for node1 to complete the ring
>>>>> Just wondering how to configure this plugin in prod env with large
>>>>> cluster.
>>>>> I tried to check grid-gain doc to see if they have provided any sample
>>>>> code to configure their plugins just to get an idea but did not find any.
>>>>>
>>>>> Can you please advise?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Prasad
>>>>>
>>>>> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Basically this is a mechanism to implement custom logical/network
>>>>>> split-brain protection. Segmentation resolvers allow you to implement
>>>>>> a way
>>>>>> to determine if node has to be segmented/stopped/etc in method
>>>>>> isValidSegment() and possibly use different combinations of resolvers
>>>>>> within
>>>>>> processor.
>>>>>>
>>>>>> If you want to check out how it could be done, some articles/source
>>>>>> samples
>>>>>> that might give you a good insight may be easily found on the web,
>>>>>> like:
>>>>>>
>>>>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>>>>>
>>>>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>>>>>
>>>>>> 2-3 are described in the documentation, copying the link just to
>>>>>> point out
>>>>>> which one:
>>>>>> https://apacheignite.readme.io/docs/critical-failures-handling
>>>>>>
>>>>>> By default answer to 2 is: Ignite doesn't ignote node FailureType
>>>>>> SEGMENTATION and calls the failure handler in this case. Actions that
>>>>>> are
>>>>>> taken are defined in failure handler.
>>>>>>
>>>>>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
>>>>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you
>>>>>> might
>>>>>> override the failure handler and call .setIgnoredFailureTypes().
>>>>>>
>>>>>> Links:
>>>>>> Extend this class:
>>>>>>
>>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
>>>>>> — check for custom implementations used in Ignite tests and how they
>>>>>> are
>>>>>> used.
>>>>>>
>>>>>> Sample from tests:
>>>>>>
>>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>>>>>>
>>>>>> Failure processor:
>>>>>>
>>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>>>>>>
>>>>>> Best regards,
>>>>>> Anton
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>>>
>>>>>

Re: Local node terminated after segmentation

Posted by Akash Shinde <ak...@gmail.com>.
Can someone please help me on this?

On Thu, Dec 12, 2019 at 1:11 PM Akash Shinde <ak...@gmail.com> wrote:

> Hi,
>
> Can you please explain on high level how GridGain implementations protects
> from having  two segments that are alive at the same time which could lead
> to data inconsistency over time? What exactly does it do to achieve this?
>
> Regards,
> A.
>
> On Wed, Dec 11, 2019 at 5:48 PM Stanislav Lukyanov <st...@gmail.com>
> wrote:
>
>> In Ignite a node can go into "segmented" state in two cases really: 1. A
>> node was unavailable (sleeping. hanging in full GC, etc) for a long time 2.
>> Cluster detected a possible split-brain situation and marked the node as
>> "segmented".
>>
>> Yes, split-brain protection (in GridGain implementation and in theory
>> too) doesn't protect your node from stopping. It protects you from having
>> two segments that are alive at the same time which could lead to data
>> inconsistency over time.
>>
>> Regarding Discovery and large clusters. If your cluster is too big for
>> the ring-based TcpDiscoverySpi to work well then you should use Zookeeper
>> Discovery which was created specifically to support large clusters.
>>
>> Stan
>>
>> On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <
>> prasadbhalerao1983@gmail.com> wrote:
>>
>>>
>>> Can someone please advise on this?
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Prasad Bhalerao <pr...@gmail.com>
>>>> Date: Fri, Nov 29, 2019 at 7:53 AM
>>>> Subject: Re: Local node terminated after segmentation
>>>> To: <us...@ignite.apache.org>
>>>>
>>>>
>>>> I had checked the resource you mentioned, but I was confused with
>>>> grid-gain doc  describing it as protection against split-brain. Because if
>>>> the node is segmented the only thing one can do is stop/restart/noop.
>>>> I was just wondering how it provides protection against split-brain.
>>>> Now I think by protection it means kill the segmented node/nodes or
>>>> restart it and bring it back in the cluster .
>>>>
>>>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the
>>>> ring right to check if the node is reachable or not.
>>>> So the question in what situation one needs one more ways to check if
>>>> the node is reachable or not using different resolvers?
>>>>
>>>> Please let me know if my understanding is correct.
>>>>
>>>> The article you mentioned, I had checked that code. It requires a node
>>>> to be configured in advance so that resolver can check if that node is
>>>> reachable from local host. It doesn't not check if all the nodes are
>>>> reachable from local host.
>>>>
>>>> Eg: node1 will check for node2 and node2 will check for node 3 and node
>>>> 3 will check for node1 to complete the ring
>>>> Just wondering how to configure this plugin in prod env with large
>>>> cluster.
>>>> I tried to check grid-gain doc to see if they have provided any sample
>>>> code to configure their plugins just to get an idea but did not find any.
>>>>
>>>> Can you please advise?
>>>>
>>>>
>>>> Thanks,
>>>> Prasad
>>>>
>>>> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Basically this is a mechanism to implement custom logical/network
>>>>> split-brain protection. Segmentation resolvers allow you to implement
>>>>> a way
>>>>> to determine if node has to be segmented/stopped/etc in method
>>>>> isValidSegment() and possibly use different combinations of resolvers
>>>>> within
>>>>> processor.
>>>>>
>>>>> If you want to check out how it could be done, some articles/source
>>>>> samples
>>>>> that might give you a good insight may be easily found on the web,
>>>>> like:
>>>>>
>>>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>>>>
>>>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>>>>
>>>>> 2-3 are described in the documentation, copying the link just to point
>>>>> out
>>>>> which one:
>>>>> https://apacheignite.readme.io/docs/critical-failures-handling
>>>>>
>>>>> By default answer to 2 is: Ignite doesn't ignote node FailureType
>>>>> SEGMENTATION and calls the failure handler in this case. Actions that
>>>>> are
>>>>> taken are defined in failure handler.
>>>>>
>>>>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
>>>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you
>>>>> might
>>>>> override the failure handler and call .setIgnoredFailureTypes().
>>>>>
>>>>> Links:
>>>>> Extend this class:
>>>>>
>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
>>>>> — check for custom implementations used in Ignite tests and how they
>>>>> are
>>>>> used.
>>>>>
>>>>> Sample from tests:
>>>>>
>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>>>>>
>>>>> Failure processor:
>>>>>
>>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>>>>>
>>>>> Best regards,
>>>>> Anton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>>
>>>>

Re: Local node terminated after segmentation

Posted by Akash Shinde <ak...@gmail.com>.
Hi,

Can you please explain on high level how GridGain implementations protects
from having  two segments that are alive at the same time which could lead
to data inconsistency over time? What exactly does it do to achieve this?

Regards,
A.

On Wed, Dec 11, 2019 at 5:48 PM Stanislav Lukyanov <st...@gmail.com>
wrote:

> In Ignite a node can go into "segmented" state in two cases really: 1. A
> node was unavailable (sleeping. hanging in full GC, etc) for a long time 2.
> Cluster detected a possible split-brain situation and marked the node as
> "segmented".
>
> Yes, split-brain protection (in GridGain implementation and in theory too)
> doesn't protect your node from stopping. It protects you from having two
> segments that are alive at the same time which could lead to data
> inconsistency over time.
>
> Regarding Discovery and large clusters. If your cluster is too big for the
> ring-based TcpDiscoverySpi to work well then you should use Zookeeper
> Discovery which was created specifically to support large clusters.
>
> Stan
>
> On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <
> prasadbhalerao1983@gmail.com> wrote:
>
>>
>> Can someone please advise on this?
>>>
>>> ---------- Forwarded message ---------
>>> From: Prasad Bhalerao <pr...@gmail.com>
>>> Date: Fri, Nov 29, 2019 at 7:53 AM
>>> Subject: Re: Local node terminated after segmentation
>>> To: <us...@ignite.apache.org>
>>>
>>>
>>> I had checked the resource you mentioned, but I was confused with
>>> grid-gain doc  describing it as protection against split-brain. Because if
>>> the node is segmented the only thing one can do is stop/restart/noop.
>>> I was just wondering how it provides protection against split-brain.
>>> Now I think by protection it means kill the segmented node/nodes or
>>> restart it and bring it back in the cluster .
>>>
>>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring
>>> right to check if the node is reachable or not.
>>> So the question in what situation one needs one more ways to check if
>>> the node is reachable or not using different resolvers?
>>>
>>> Please let me know if my understanding is correct.
>>>
>>> The article you mentioned, I had checked that code. It requires a node
>>> to be configured in advance so that resolver can check if that node is
>>> reachable from local host. It doesn't not check if all the nodes are
>>> reachable from local host.
>>>
>>> Eg: node1 will check for node2 and node2 will check for node 3 and node
>>> 3 will check for node1 to complete the ring
>>> Just wondering how to configure this plugin in prod env with large
>>> cluster.
>>> I tried to check grid-gain doc to see if they have provided any sample
>>> code to configure their plugins just to get an idea but did not find any.
>>>
>>> Can you please advise?
>>>
>>>
>>> Thanks,
>>> Prasad
>>>
>>> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:
>>>
>>>> Hello,
>>>>
>>>> Basically this is a mechanism to implement custom logical/network
>>>> split-brain protection. Segmentation resolvers allow you to implement a
>>>> way
>>>> to determine if node has to be segmented/stopped/etc in method
>>>> isValidSegment() and possibly use different combinations of resolvers
>>>> within
>>>> processor.
>>>>
>>>> If you want to check out how it could be done, some articles/source
>>>> samples
>>>> that might give you a good insight may be easily found on the web, like:
>>>>
>>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>>>
>>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>>>
>>>> 2-3 are described in the documentation, copying the link just to point
>>>> out
>>>> which one:
>>>> https://apacheignite.readme.io/docs/critical-failures-handling
>>>>
>>>> By default answer to 2 is: Ignite doesn't ignote node FailureType
>>>> SEGMENTATION and calls the failure handler in this case. Actions that
>>>> are
>>>> taken are defined in failure handler.
>>>>
>>>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
>>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
>>>> override the failure handler and call .setIgnoredFailureTypes().
>>>>
>>>> Links:
>>>> Extend this class:
>>>>
>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
>>>> — check for custom implementations used in Ignite tests and how they are
>>>> used.
>>>>
>>>> Sample from tests:
>>>>
>>>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>>>>
>>>> Failure processor:
>>>>
>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>>>>
>>>> Best regards,
>>>> Anton
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>
>>>

Re: Local node terminated after segmentation

Posted by Stanislav Lukyanov <st...@gmail.com>.
In Ignite a node can go into "segmented" state in two cases really: 1. A
node was unavailable (sleeping. hanging in full GC, etc) for a long time 2.
Cluster detected a possible split-brain situation and marked the node as
"segmented".

Yes, split-brain protection (in GridGain implementation and in theory too)
doesn't protect your node from stopping. It protects you from having two
segments that are alive at the same time which could lead to data
inconsistency over time.

Regarding Discovery and large clusters. If your cluster is too big for the
ring-based TcpDiscoverySpi to work well then you should use Zookeeper
Discovery which was created specifically to support large clusters.

Stan

On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <pr...@gmail.com>
wrote:

>
> Can someone please advise on this?
>>
>> ---------- Forwarded message ---------
>> From: Prasad Bhalerao <pr...@gmail.com>
>> Date: Fri, Nov 29, 2019 at 7:53 AM
>> Subject: Re: Local node terminated after segmentation
>> To: <us...@ignite.apache.org>
>>
>>
>> I had checked the resource you mentioned, but I was confused with
>> grid-gain doc  describing it as protection against split-brain. Because if
>> the node is segmented the only thing one can do is stop/restart/noop.
>> I was just wondering how it provides protection against split-brain.
>> Now I think by protection it means kill the segmented node/nodes or
>> restart it and bring it back in the cluster .
>>
>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring
>> right to check if the node is reachable or not.
>> So the question in what situation one needs one more ways to check if the
>> node is reachable or not using different resolvers?
>>
>> Please let me know if my understanding is correct.
>>
>> The article you mentioned, I had checked that code. It requires a node to
>> be configured in advance so that resolver can check if that node is
>> reachable from local host. It doesn't not check if all the nodes are
>> reachable from local host.
>>
>> Eg: node1 will check for node2 and node2 will check for node 3 and node 3
>> will check for node1 to complete the ring
>> Just wondering how to configure this plugin in prod env with large
>> cluster.
>> I tried to check grid-gain doc to see if they have provided any sample
>> code to configure their plugins just to get an idea but did not find any.
>>
>> Can you please advise?
>>
>>
>> Thanks,
>> Prasad
>>
>> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:
>>
>>> Hello,
>>>
>>> Basically this is a mechanism to implement custom logical/network
>>> split-brain protection. Segmentation resolvers allow you to implement a
>>> way
>>> to determine if node has to be segmented/stopped/etc in method
>>> isValidSegment() and possibly use different combinations of resolvers
>>> within
>>> processor.
>>>
>>> If you want to check out how it could be done, some articles/source
>>> samples
>>> that might give you a good insight may be easily found on the web, like:
>>>
>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>>
>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>>
>>> 2-3 are described in the documentation, copying the link just to point
>>> out
>>> which one:
>>> https://apacheignite.readme.io/docs/critical-failures-handling
>>>
>>> By default answer to 2 is: Ignite doesn't ignote node FailureType
>>> SEGMENTATION and calls the failure handler in this case. Actions that are
>>> taken are defined in failure handler.
>>>
>>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
>>> override the failure handler and call .setIgnoredFailureTypes().
>>>
>>> Links:
>>> Extend this class:
>>>
>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
>>> — check for custom implementations used in Ignite tests and how they are
>>> used.
>>>
>>> Sample from tests:
>>>
>>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>>>
>>> Failure processor:
>>>
>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>>>
>>> Best regards,
>>> Anton
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: Local node terminated after segmentation

Posted by Prasad Bhalerao <pr...@gmail.com>.
> Can someone please advise on this?
>
> ---------- Forwarded message ---------
> From: Prasad Bhalerao <pr...@gmail.com>
> Date: Fri, Nov 29, 2019 at 7:53 AM
> Subject: Re: Local node terminated after segmentation
> To: <us...@ignite.apache.org>
>
>
> I had checked the resource you mentioned, but I was confused with
> grid-gain doc  describing it as protection against split-brain. Because if
> the node is segmented the only thing one can do is stop/restart/noop.
> I was just wondering how it provides protection against split-brain.
> Now I think by protection it means kill the segmented node/nodes or
> restart it and bring it back in the cluster .
>
> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring
> right to check if the node is reachable or not.
> So the question in what situation one needs one more ways to check if the
> node is reachable or not using different resolvers?
>
> Please let me know if my understanding is correct.
>
> The article you mentioned, I had checked that code. It requires a node to
> be configured in advance so that resolver can check if that node is
> reachable from local host. It doesn't not check if all the nodes are
> reachable from local host.
>
> Eg: node1 will check for node2 and node2 will check for node 3 and node 3
> will check for node1 to complete the ring
> Just wondering how to configure this plugin in prod env with large cluster.
> I tried to check grid-gain doc to see if they have provided any sample
> code to configure their plugins just to get an idea but did not find any.
>
> Can you please advise?
>
>
> Thanks,
> Prasad
>
> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:
>
>> Hello,
>>
>> Basically this is a mechanism to implement custom logical/network
>> split-brain protection. Segmentation resolvers allow you to implement a
>> way
>> to determine if node has to be segmented/stopped/etc in method
>> isValidSegment() and possibly use different combinations of resolvers
>> within
>> processor.
>>
>> If you want to check out how it could be done, some articles/source
>> samples
>> that might give you a good insight may be easily found on the web, like:
>>
>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>
>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>
>> 2-3 are described in the documentation, copying the link just to point out
>> which one: https://apacheignite.readme.io/docs/critical-failures-handling
>>
>> By default answer to 2 is: Ignite doesn't ignote node FailureType
>> SEGMENTATION and calls the failure handler in this case. Actions that are
>> taken are defined in failure handler.
>>
>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
>> override the failure handler and call .setIgnoredFailureTypes().
>>
>> Links:
>> Extend this class:
>>
>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
>> — check for custom implementations used in Ignite tests and how they are
>> used.
>>
>> Sample from tests:
>>
>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>>
>> Failure processor:
>>
>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>>
>> Best regards,
>> Anton
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Fwd: Local node terminated after segmentation

Posted by Prasad Bhalerao <pr...@gmail.com>.
Can someone please advise on this?

---------- Forwarded message ---------
From: Prasad Bhalerao <pr...@gmail.com>
Date: Fri, Nov 29, 2019 at 7:53 AM
Subject: Re: Local node terminated after segmentation
To: <us...@ignite.apache.org>


I had checked the resource you mentioned, but I was confused with grid-gain
doc  describing it as protection against split-brain. Because if the node
is segmented the only thing one can do is stop/restart/noop.
I was just wondering how it provides protection against split-brain.
Now I think by protection it means kill the segmented node/nodes or restart
it and bring it back in the cluster .

Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring
right to check if the node is reachable or not.
So the question in what situation one needs one more ways to check if the
node is reachable or not using different resolvers?

Please let me know if my understanding is correct.

The article you mentioned, I had checked that code. It requires a node to
be configured in advance so that resolver can check if that node is
reachable from local host. It doesn't not check if all the nodes are
reachable from local host.

Eg: node1 will check for node2 and node2 will check for node 3 and node 3
will check for node1 to complete the ring
Just wondering how to configure this plugin in prod env with large cluster.
I tried to check grid-gain doc to see if they have provided any sample code
to configure their plugins just to get an idea but did not find any.

Can you please advise?


Thanks,
Prasad

On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:

> Hello,
>
> Basically this is a mechanism to implement custom logical/network
> split-brain protection. Segmentation resolvers allow you to implement a way
> to determine if node has to be segmented/stopped/etc in method
> isValidSegment() and possibly use different combinations of resolvers
> within
> processor.
>
> If you want to check out how it could be done, some articles/source samples
> that might give you a good insight may be easily found on the web, like:
>
> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>
> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>
> 2-3 are described in the documentation, copying the link just to point out
> which one: https://apacheignite.readme.io/docs/critical-failures-handling
>
> By default answer to 2 is: Ignite doesn't ignote node FailureType
> SEGMENTATION and calls the failure handler in this case. Actions that are
> taken are defined in failure handler.
>
> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
> override the failure handler and call .setIgnoredFailureTypes().
>
> Links:
> Extend this class:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
> — check for custom implementations used in Ignite tests and how they are
> used.
>
> Sample from tests:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>
> Failure processor:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>
> Best regards,
> Anton
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Local node terminated after segmentation

Posted by Prasad Bhalerao <pr...@gmail.com>.
I had checked the resource you mentioned, but I was confused with grid-gain
doc  describing it as protection against split-brain. Because if the node
is segmented the only thing one can do is stop/restart/noop.
I was just wondering how it provides protection against split-brain.
Now I think by protection it means kill the segmented node/nodes or restart
it and bring it back in the cluster .

Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring
right to check if the node is reachable or not.
So the question in what situation one needs one more ways to check if the
node is reachable or not using different resolvers?

Please let me know if my understanding is correct.

The article you mentioned, I had checked that code. It requires a node to
be configured in advance so that resolver can check if that node is
reachable from local host. It doesn't not check if all the nodes are
reachable from local host.

Eg: node1 will check for node2 and node2 will check for node 3 and node 3
will check for node1 to complete the ring
Just wondering how to configure this plugin in prod env with large cluster.
I tried to check grid-gain doc to see if they have provided any sample code
to configure their plugins just to get an idea but did not find any.

Can you please advise?


Thanks,
Prasad

On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr.dev@gmail.com wrote:

> Hello,
>
> Basically this is a mechanism to implement custom logical/network
> split-brain protection. Segmentation resolvers allow you to implement a way
> to determine if node has to be segmented/stopped/etc in method
> isValidSegment() and possibly use different combinations of resolvers
> within
> processor.
>
> If you want to check out how it could be done, some articles/source samples
> that might give you a good insight may be easily found on the web, like:
>
> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>
> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>
> 2-3 are described in the documentation, copying the link just to point out
> which one: https://apacheignite.readme.io/docs/critical-failures-handling
>
> By default answer to 2 is: Ignite doesn't ignote node FailureType
> SEGMENTATION and calls the failure handler in this case. Actions that are
> taken are defined in failure handler.
>
> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
> override the failure handler and call .setIgnoredFailureTypes().
>
> Links:
> Extend this class:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
> — check for custom implementations used in Ignite tests and how they are
> used.
>
> Sample from tests:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java
>
> Failure processor:
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java
>
> Best regards,
> Anton
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Local node terminated after segmentation

Posted by akurbanov <an...@gmail.com>.
Hello,

Basically this is a mechanism to implement custom logical/network
split-brain protection. Segmentation resolvers allow you to implement a way
to determine if node has to be segmented/stopped/etc in method
isValidSegment() and possibly use different combinations of resolvers within
processor.

If you want to check out how it could be done, some articles/source samples
that might give you a good insight may be easily found on the web, like:
https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html

2-3 are described in the documentation, copying the link just to point out
which one: https://apacheignite.readme.io/docs/critical-failures-handling

By default answer to 2 is: Ignite doesn't ignote node FailureType
SEGMENTATION and calls the failure handler in this case. Actions that are
taken are defined in failure handler.

AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and
SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might
override the failure handler and call .setIgnoredFailureTypes().

Links:
Extend this class:
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java
— check for custom implementations used in Ignite tests and how they are
used.

Sample from tests: 
https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java

Failure processor:
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java

Best regards,
Anton





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Local node terminated after segmentation

Posted by Prasad Bhalerao <pr...@gmail.com>.
Hi,

Can someone please help me out with following questions.

1) If the ignite is capable of detecting nodes segmentation and taking
STOP,RESTART_JVM or NOOP action based on configured failure handlers then
why do we need explicit SegmentationResolvers?

2) Does ignite always treat node segmentation as "Critical system error"
and use "StopNodeOrHaltFailureHandler" to take required action which
"Teminate Node"?

3) Are there any other reasons for   "Critical system error detected"
message?

Thanks,
Prasad




On Wed, Nov 27, 2019 at 11:01 PM akurbanov <an...@gmail.com> wrote:

> Hello,
>
> Please refer to documentation on failure handler:
> https://apacheignite.readme.io/docs/critical-failures-handling.
>
> As it is correctly stated, we cannot restart the JVM without external
> tooling, by default we are doing this for nodes that were started with
> ignite.sh/bat so that Ignite start goes through
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/startup/cmdline/CommandLineStartup.java
>
> As for the segmentation, subscribe to
>
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html#EVT_NODE_SEGMENTED
>
> Event listeners doc: https://apacheignite.readme.io/docs/events
>
> You will receive this event in the listener and after this you might do
> anything that you want with the JVM, easiest way is to exit JVM with some
> code and handle it outside of the application.
>
> Best regards,
> Anton
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Local node terminated after segmentation

Posted by akurbanov <an...@gmail.com>.
Hello,

Please refer to documentation on failure handler:
https://apacheignite.readme.io/docs/critical-failures-handling.

As it is correctly stated, we cannot restart the JVM without external
tooling, by default we are doing this for nodes that were started with
ignite.sh/bat so that Ignite start goes through
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/startup/cmdline/CommandLineStartup.java

As for the segmentation, subscribe to
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/events/EventType.html#EVT_NODE_SEGMENTED

Event listeners doc: https://apacheignite.readme.io/docs/events

You will receive this event in the listener and after this you might do
anything that you want with the JVM, easiest way is to exit JVM with some
code and handle it outside of the application.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Local node terminated after segmentation

Posted by Prasad Bhalerao <pr...@gmail.com>.
Why did you have to extend stop handler?
Why couldn't you use the existing one provided in ignite?

Btw the question is about how to restart the JVM? JVM can't restart itself
without taking outside tool/scripts help.

Thanks,
Prasad


On Wed 27 Nov, 2019, 9:43 PM Surinder Mehra <rednirus@gmail.com wrote:

> You can extend the stopnode handler and stop Java process when
> stopnodehandler is called by ignite.
> We did similar thing in our project
>
> On Wed, Nov 27, 2019, 17:00 Akash Shinde <ak...@gmail.com> wrote:
>
>> Hi ,
>>
>> I have started four server nodes. One of the node got terminated
>> unexpectedly giving following error. Before terminating the JVM the node
>> was segmented.
>>
>> 1) Does ignite always treat node segmentation as "Critical system error"
>> and use "StopNodeOrHaltFailureHandler" to take required action which
>> "Teminate Node" in this case?
>>
>> 2) Are there any other reasons for   "Critical system error detected"
>> message?
>>
>> I have not set the SegmentationPolicy  explicitly.  AFAIK ignite does not
>> provide SegmentationResolver and SegmentationPolicy out of box.
>>
>> 3) Do I need to implement SegmentationResolver and set the
>> SegmenetationPolicy to "STOP" if I want to stop the JVM if the node is
>> segmented?
>>
>> 4) I am starting Ignite in embedded mode. When a node is segmented  I
>> want restart the JVM. I
>> Is there any way to do this? (I am not using ignite.sh/ignite.bat) to
>> start the ignite.
>>
>> Please find attached logs.
>>
>> Exception:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *2019-11-27 08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%]
>> WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED:
>> TcpDiscoveryNode [id=b4fce076-cc7a-47ee-98fd-31e1d610b5de,
>> addrs=[10.45.65.97, 127.0.0.1], sockAddrs=[/10.45.65.97:47500
>> <http://10.45.65.97:47500>, /127.0.0.1:47500 <http://127.0.0.1:47500>],
>> discPort=47500, order=1, intOrder=1, lastExchangeTime=1574843446983,
>> loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=false]2019-11-27
>> 08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%] WARN
>>  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: TcpDiscoveryNode
>> [id=b4fce076-cc7a-47ee-98fd-31e1d610b5de, addrs=[10.45.65.97, 127.0.0.1],
>> sockAddrs=[/10.45.65.97:47500 <http://10.45.65.97:47500>, /127.0.0.1:47500
>> <http://127.0.0.1:47500>], discPort=47500, order=1, intOrder=1,
>> lastExchangeTime=1574843446983, loc=true, ver=2.6.0#20180710-sha1:669feacc,
>> isClient=false]2019-11-27 08:30:46,994 9321190
>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical system error
>> detected. Will be handled accordingly to configured handler [hnd=class
>> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated
>> unexpectedly.]]java.lang.IllegalStateException: Thread
>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>       at
>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
>> 08:30:46,994 9321190 [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical
>> system error detected. Will be handled accordingly to configured handler
>> [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>> err=java.lang.IllegalStateException: Thread
>> tcp-disco-srvr-#3%springDataNode% is terminated
>> unexpectedly.]]java.lang.IllegalStateException: Thread
>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>       at
>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
>> 08:30:46,995 9321191 [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will
>> be halted immediately due to the failure: [failureCtx=FailureContext
>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated
>> unexpectedly.]]2019-11-27 08:30:46,995 9321191
>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will be halted immediately
>> due to the failure: [failureCtx=FailureContext
>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]*
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Local node terminated after segmentation

Posted by Surinder Mehra <re...@gmail.com>.
You can extend the stopnode handler and stop Java process when
stopnodehandler is called by ignite.
We did similar thing in our project

On Wed, Nov 27, 2019, 17:00 Akash Shinde <ak...@gmail.com> wrote:

> Hi ,
>
> I have started four server nodes. One of the node got terminated
> unexpectedly giving following error. Before terminating the JVM the node
> was segmented.
>
> 1) Does ignite always treat node segmentation as "Critical system error"
> and use "StopNodeOrHaltFailureHandler" to take required action which
> "Teminate Node" in this case?
>
> 2) Are there any other reasons for   "Critical system error detected"
> message?
>
> I have not set the SegmentationPolicy  explicitly.  AFAIK ignite does not
> provide SegmentationResolver and SegmentationPolicy out of box.
>
> 3) Do I need to implement SegmentationResolver and set the
> SegmenetationPolicy to "STOP" if I want to stop the JVM if the node is
> segmented?
>
> 4) I am starting Ignite in embedded mode. When a node is segmented  I want
> restart the JVM. I
> Is there any way to do this? (I am not using ignite.sh/ignite.bat) to
> start the ignite.
>
> Please find attached logs.
>
> Exception:
>
>
>
>
>
>
>
>
>
>
>
>
> *2019-11-27 08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%]
> WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED:
> TcpDiscoveryNode [id=b4fce076-cc7a-47ee-98fd-31e1d610b5de,
> addrs=[10.45.65.97, 127.0.0.1], sockAddrs=[/10.45.65.97:47500
> <http://10.45.65.97:47500>, /127.0.0.1:47500 <http://127.0.0.1:47500>],
> discPort=47500, order=1, intOrder=1, lastExchangeTime=1574843446983,
> loc=true, ver=2.6.0#20180710-sha1:669feacc, isClient=false]2019-11-27
> 08:30:46,992 9321188 [disco-event-worker-#61%springDataNode%] WARN
>  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: TcpDiscoveryNode
> [id=b4fce076-cc7a-47ee-98fd-31e1d610b5de, addrs=[10.45.65.97, 127.0.0.1],
> sockAddrs=[/10.45.65.97:47500 <http://10.45.65.97:47500>, /127.0.0.1:47500
> <http://127.0.0.1:47500>], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1574843446983, loc=true, ver=2.6.0#20180710-sha1:669feacc,
> isClient=false]2019-11-27 08:30:46,994 9321190
> [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical system error
> detected. Will be handled accordingly to configured handler [hnd=class
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Thread tcp-disco-srvr-#3%springDataNode% is terminated
> unexpectedly.]]java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>       at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
> 08:30:46,994 9321190 [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
> err=java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#3%springDataNode% is terminated
> unexpectedly.]]java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.        at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>       at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)2019-11-27
> 08:30:46,995 9321191 [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will
> be halted immediately due to the failure: [failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Thread tcp-disco-srvr-#3%springDataNode% is terminated
> unexpectedly.]]2019-11-27 08:30:46,995 9321191
> [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will be halted immediately
> due to the failure: [failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]*
>
>
>
>
>
>
>
>
>