You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by kumaresan <ku...@hotmail.com> on 2017/05/25 17:11:03 UTC

Node is out of topology

Team,

We are using Ignite 1.9 with two cache servers. It is Linux environment.
This issue happened in version 1.5 as well. One of cache servers went out of
topology. Anyone, please let me know possible scenarios to check it out.

One of the scenario, I came to know that when VM goes to frozen state, but
could not get reference. 

Getting following error Node is out of topology:
18:12:55,224][INFO][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=f8e0d171, name=null]
    ^-- H/N/C [hosts=5, nodes=5, CPUs=14]
    ^-- CPU [cur=4.83%, avg=1.38%, GC=0%]
    ^-- Heap [used=2468MB, free=69%, comm=7963MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[18:15:05,700][INFO][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local
node seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=10000, connCheckFreq=3333]
[18:15:06,865][INFO][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=f8e0d171, name=null]
    ^-- H/N/C [hosts=5, nodes=5, CPUs=14]
    ^-- CPU [cur=100%, avg=1.38%, GC=1078.23%]
    ^-- Heap [used=231MB, free=97.09%, comm=7957MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=3, idle=13, qSize=0]
    ^-- Outbound messages queue [size=0]
[18:15:10,778][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems).
[18:15:11,724][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems).
[18:15:11,908][WARNING][disco-event-worker-#48%null%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=f8e0d171-2c6f-4688-9a89-525b3aa35574, addrs=[0:0:0:0:0:0:0:1%lo,
XX.XXX.X.XXX, 127.0.0.1],
sockAddrs=[dc4-pvm-resc-01.ext.ceb/XX.XXX.X.XXX:47500,
/0:0:0:0:0:0:0:1%lo:47500, /XX.XXX.X.XXX:47500, /127.0.0.1:47500],
discPort=47500, order=79, intOrder=42, lastExchangeTime=1494972911738,
loc=true, ver=1.9.0#20151229-sha1:f1f8cda2, isClient=false]
[18:15:13,516][SEVERE][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
in order to prevent cluster wide instability.
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522)
        at
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5779)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2161)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[18:15:15,130][WARNING][sys-#31%null%][GridCachePartitionExchangeManager]
Failed to send partitions full message [node=TcpDiscoveryNod

XX.XXX.X.XXX=IP

Thanks,
Kumaresan



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Node-is-out-of-topology-tp13147.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Node is out of topology

Posted by vdpyatkov <vl...@gmail.com>.

Hi,

One way to re-join node to topology is stoping it and restart.

Node was not been join to cluster themself, because does not possible to
merge data between one cluster node (due to GC pause) and whole topology.
After the node left the cluster some of cache operation (put, remove) could
be and a simple return of node lets to inconsistent data.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Node-is-out-of-topology-tp13147p13372.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Node is out of topology

Posted by vinshar <vi...@gmail.com>.

I think the question is that why node did not joined topology again after
garbage collection finished? Node can go out of topology for numerous
reasons but should it not join back when issue is resolved?  



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Node-is-out-of-topology-tp13147p13320.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Node is out of topology

Posted by vdpyatkov <vl...@gmail.com>.

Hi,

What are you wan to check? Did not possible know frozen process or not into
themself.
Also you should set a alarm to this metric:

^-- CPU [cur=100%, avg=1.38%, *GC=1078.23%* ]

it means you node suffer from lot of garbage (which lead to SWT pause).

If you want to know when the node is segmented, you can subscribe to the
event[1] EVT_NODE_SEGMENTED

[1]: https://apacheignite.readme.io/docs/events#section-local-events



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Node-is-out-of-topology-tp13147p13310.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.