You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Ambha <am...@wipro.com> on 2016/01/18 06:30:43 UTC

issue with 2-node ignite server cluster

I have setup 2 ignite server nodes running on Ubantu 14.X with Java 8, have
created a grid and also discovery is set to TcpDiscoveryVmIpFinder and have
one more system running on Windows with Java 8 in a client mode. 

I have modified default ignite config on both the servers to include
discoverySpi, gridName... Starting 2-node cluster works fine initially. But
when I connect Windows client, one of the server stops after a few minutes
and other node display "Failed to send message node may have left the grid"



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/issue-with-2-node-ignite-server-cluster-tp2600.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: issue with 2-node ignite server cluster

Posted by Denis Magda <dm...@gridgain.com>.
Hi,

This is a generic log from a node that detected that another one node left a
cluster abruptly.

I need all the logs from all the nodes you have. Please share them via
Dropbox or some other file sharing tool.

In general a node can leave topology because of long garbage collection
pauses. So activate garbage collection logs on all the nodes and share them
as well.

Regards,
Denis



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/issue-with-2-node-ignite-server-cluster-tp2600p2651.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: issue with 2-node ignite server cluster

Posted by Ambha <am...@wipro.com>.
This is the error it displays

**********
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to
firewall issues) [node=TcpDiscoveryNode
[id=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.186.247], sockAddrs=[/192.168.186.247:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.186.247:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1453209329890,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false],
topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage
[parts={-2100569601=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=122, moving=0,
size=100], 689859866=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512], 1325947219=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=42, moving=0,
size=20], 2034899268=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512], -1559869495=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512]}, client=false, super=GridDhtPartitionsAbstractMessage
[exchId=null, lastVer=GridCacheVersion [topVer=0, nodeOrderDrId=0,
globalTime=0, order=1453209329443], super=GridCacheMessage [msgId=26,
depInfo=null, err=null, skipPrepare=false]]], policy=2]
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1071)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1214)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:760)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:671)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:690)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1800(GridCachePartitionExchangeManager.java:95)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1152)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.186.247], sockAddrs=[/192.168.186.247:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.186.247:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1453209329890,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1940)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1880)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1066)
        ... 9 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each GridComputeTask and
GridCacheTransaction has a timeout set in order to prevent parties from
waiting forever in case of network issues
[nodeId=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[/192.168.186.247:47100,
/0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2421)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2074)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1978)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1914)
        ... 11 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /192.168.186.247:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
read remote node recovery handshake (connection closed).
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2634)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /0:0:0:0:0:0:0:1%lo:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Remote
node ID is not as expected [expected=4b13c224-21be-4c32-b29d-a0d7d079df10,
rcvd=3955b599-e49c-43f3-ad07-752b349db1ac]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2539)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /127.0.0.1:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Remote
node ID is not as expected [expected=4b13c224-21be-4c32-b29d-a0d7d079df10,
rcvd=3955b599-e49c-43f3-ad07-752b349db1ac]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2539)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
[18:47:25,778][SEVERE][exchange-worker-#41%testGrid%][GridCachePartitionExchangeManager]
Failed to send local partition map to node [node=TcpDiscoveryNode
[id=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.186.247], sockAddrs=[/192.168.186.247:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.186.247:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1453209329890,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false], exchId=null]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to
firewall issues) [node=TcpDiscoveryNode
[id=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.186.247], sockAddrs=[/192.168.186.247:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.186.247:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1453209329890,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false],
topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage
[parts={-2100569601=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=122, moving=0,
size=100], 689859866=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512], 1325947219=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=42, moving=0,
size=20], 2034899268=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512], -1559869495=GridDhtPartitionMap
[nodeId=3955b599-e49c-43f3-ad07-752b349db1ac, updateSeq=534, moving=0,
size=512]}, client=false, super=GridDhtPartitionsAbstractMessage
[exchId=null, lastVer=GridCacheVersion [topVer=0, nodeOrderDrId=0,
globalTime=0, order=1453209329443], super=GridCacheMessage [msgId=27,
depInfo=null, err=null, skipPrepare=false]]], policy=2]
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1071)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1214)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:760)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:671)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:690)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1800(GridCachePartitionExchangeManager.java:95)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1152)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.186.247], sockAddrs=[/192.168.186.247:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.186.247:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1453209329890,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1940)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1880)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1066)
        ... 9 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each GridComputeTask and
GridCacheTransaction has a timeout set in order to prevent parties from
waiting forever in case of network issues
[nodeId=4b13c224-21be-4c32-b29d-a0d7d079df10, addrs=[/192.168.186.247:47100,
/0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2421)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2074)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1978)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1914)
        ... 11 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /192.168.186.247:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
read remote node recovery handshake (connection closed).
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2634)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /0:0:0:0:0:0:0:1%lo:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Remote
node ID is not as expected [expected=4b13c224-21be-4c32-b29d-a0d7d079df10,
rcvd=3955b599-e49c-43f3-ad07-752b349db1ac]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2539)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed
to connect to address: /127.0.0.1:47100
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2426)
                ... 14 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Remote
node ID is not as expected [expected=4b13c224-21be-4c32-b29d-a0d7d079df10,
rcvd=3955b599-e49c-43f3-ad07-752b349db1ac]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2539)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2293)
                ... 14 more
[18:47:29] Topology snapshot [ver=3, servers=2, clients=1, CPUs=5,
heap=9.3GB]
[18:47:29] Topology snapshot [ver=4, servers=1, clients=1, CPUs=3,
heap=5.3GB]
[18:47:29] Topology snapshot [ver=7, servers=1, clients=0, CPUs=1,
heap=4.0GB]
[18:47:31] Ignite node stopped OK [uptime=00:01:55:788]




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/issue-with-2-node-ignite-server-cluster-tp2600p2634.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: issue with 2-node ignite server cluster

Posted by Denis Magda <dm...@gridgain.com>.
Hi,

There are tons of the reasons why this can happen. When the Windows client
is started does it being to use the cluster somehow? If it does then your
code can lead to the failure of the server.

In any case please share the logs from all the nodes from us and the source
of the client code that is executed across the cluster.

Regards,
Denis 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/issue-with-2-node-ignite-server-cluster-tp2600p2609.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.