You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by camer314 <ca...@towerswatson.com> on 2019/11/26 08:08:22 UTC

Ignite .NET and TcpDiscoveryStaticIpFinder failure

I am using 2 Azure virtual machines, both part of the same VNET. The network
security policy is applied at the network level and allows all traffic
in/out from the VNET.

I am trying to run a cluster across both machines. Seeing as Azure does not
support broadcast I am using a static IP list, which are the private IPs of
my two machines like such:

   <discoverySpi type='TcpDiscoverySpi'>
        <ipFinder type='TcpDiscoveryStaticIpFinder'>
            <endpoints>
                <string>10.0.2.5:47500..47509</string>
                <string>10.0.2.11:47500..47509</string>
            </endpoints>
        </ipFinder>
    </discoverySpi>

There is not much going on in the way of error messages but the connection
from 11 -> 5 must keep dropping or some other issue is preventing that node
joining.

On the 11 machine I have these log entries, indicating its found the other
node, but then seems to immediately lose the connection?

/[06:50:17,775][INFO][tcp-disco-msg-worker-#2][GridEncryptionManager]
Joining node doesn't have encryption data
[node=41518d17-a16a-48a0-9656-cd3d2b6e0042]
[06:50:17,800][INFO][tcp-disco-msg-worker-#2][TcpDiscoverySpi] New next node
[newNext=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]]
[06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.0.2.5, rmtPort=52234]
[06:50:22,779][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.0.2.5, rmtPort=52234]
[06:50:22,780][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234]
[06:50:22,788][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.0.2.5:52234, rmtPort=52234
[06:50:22,805][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Failed to
send message to next node [msg=TcpDiscoveryNodeAddedMessage
[node=TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a,
discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null,
gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b,
verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=TcpDiscoveryNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5,
10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500,
/10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false], errMsg=Failed to send message to next node
[msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, addrs=[0:0:0:0:0:0:0:1, 10.0.2.5,
10.0.75.1, 127.0.0.1], sockAddrs=[camyakoubCPU/10.0.2.5:47500,
/10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=0, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false],
dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@8b3b66a,
discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null,
gridStartTime=1574750964272, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=aa79676ae61-e3b1d49a-a023-435a-961c-13394c08ad0b,
verifierNodeId=e3b1d49a-a023-435a-961c-13394c08ad0b, topVer=0, pendingIdx=0,
failedNodes=null, isClient=false]], next=ClusterNode
[id=41518d17-a16a-48a0-9656-cd3d2b6e0042, order=0, addr=[0:0:0:0:0:0:0:1,
10.0.2.5, 10.0.75.1, 127.0.0.1], daemon=false]]]
[06:50:22,806][WARNING][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Local node
has detected failed nodes and started cluster-wide procedure. To speed up
failure detection please see 'Failure Detection' section under javadoc for
'TcpDiscoverySpi'
[06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Added new
node to topology: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]
[06:50:22,812][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology
snapshot [ver=2, locNode=e3b1d49a, servers=2, clients=0, state=ACTIVE,
CPUs=8, offheap=6.4GB, heap=7.1GB]
[06:50:22,816][INFO][exchange-worker-#43][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=e3b1d49a-a023-435a-961c-13394c08ad0b,
crdVer=1574750964273, topVer=AffinityTopologyVersion [topVer=1,
minorTopVer=0]], mvccCrdChange=false, crd=true, evt=NODE_JOINED,
evtNode=41518d17-a16a-48a0-9656-cd3d2b6e0042, customEvt=null,
allowMerge=true]
[06:50:22,816][WARNING][disco-event-worker-#42][GridDiscoveryManager] Node
FAILED: TcpDiscoveryNode [id=41518d17-a16a-48a0-9656-cd3d2b6e0042,
addrs=[0:0:0:0:0:0:0:1, 10.0.2.5, 10.0.75.1, 127.0.0.1],
sockAddrs=[camyakoubCPU/10.0.2.5:47500, /10.0.75.1:47500,
/0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500], discPort=47500, order=2, intOrder=2,
lastExchangeTime=1574751017748, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=false]
[06:50:22,819][INFO][disco-event-worker-#42][GridDiscoveryManager] Topology
snapshot [ver=3, locNode=e3b1d49a, servers=1, clients=0, state=ACTIVE,
CPUs=4, offheap=3.2GB, heap=3.6GB]
/

On the other machine the log shows only this:

/[06:50:22,774][WARNING][main][TcpDiscoverySpi] Node has not been connected
to topology and will repeat join process. Check remote nodes logs for
possible error messages. Note that large topology may require significant
time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration
property if getting this message on the starting nodes
[networkTimeout=5000]/

I tried changing the timeout to 2000 but it made no difference.

Any ideas? Seems like an incredibly simple setup so must be something
obvious.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite .NET and TcpDiscoveryStaticIpFinder failure

Posted by camer314 <ca...@towerswatson.com>.
I created a new VM and it now seems to work ok, I have this message which
would indicate everything is connected (which would be correct, 3 nodes, 2
hosts and 6 CPU total)

H/N/C [hosts=2, nodes=3, CPUs=6]

Also btw, when creating the new VM I had to install Java, I chose
jdk-8u221-windows-x64 to match my working VM. When i tried to start
Apache.Ignite.exe I was faced with the error "Failed to load jvm.dll" even
though everything seemed correct. I them came across the below issue which
recommended installing VC++ 2010 redist however ptupitsyn suggested that was
not needed from 2.4+ and i am using the latest 2.7.6. It was only after
installing the VC++ 2010 libraries that Apache.Ignite.exe was able to run.

http://apache-ignite-users.70518.x6.nabble.com/Ignite-NET-2-2-crashes-on-startup-without-any-information-td17173i20.html



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite .NET and TcpDiscoveryStaticIpFinder failure

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

You should make Discovery identical on both nodes, pointing to both nodes
via their public address.

[11:17:08,959][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509]
[11:17:08,965][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509

This would point to some problem. Can you enable verbose node? ignite.sh
-v, or running with -DIGNITE_QUIET=false.

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 14:25, camer314 <ca...@towerswatson.com>:

> I dont know about that, is that maybe because I had both addresses down as
> endpoints?
>
> I have modified app.config on both machines to only point to one another:
>
> /       <discoverySpi type="TcpDiscoverySpi" networkTimeout="30000"
> joinTimeout="60000">
>         <ipFinder type="TcpDiscoveryStaticIpFinder">
>             <endpoints>
>                 <string>10.0.2.5:47500..47502</string>
>             </endpoints>
>         </ipFinder>
> /
>
>        <discoverySpi type="TcpDiscoverySpi" networkTimeout="60000"
> joinTimeout="120000">
>         <ipFinder type="TcpDiscoveryStaticIpFinder">
>             <endpoints>
>                 <string>10.0.2.11:47500..47502</string>
>             </endpoints>
>         </ipFinder>
>     </discoverySpi>
>
>
> Now on the #11 machine the log is a lot cleaner, it just has these messages
> repeated:
>
> [11:17:08,959][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/10.0.2.5, rmtPort=53509]
> [11:17:08,959][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/10.0.2.5, rmtPort=53509]
> [11:17:08,959][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509]
> [11:17:08,965][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509
>
> But on the #5 machine the log is entirely void of any detail at all:
>
> [11:14:56,534][INFO][main][IgniteKernal] Non-loopback local IPs:
> *10.0.2.5*,
> 10.0.75.1, fe80:0:0:0:0:5efe:a00:205%net2, fe80:0:0:0:0:5efe:a00:4b01%net7,
> fe80:0:0:0:35b8:b554:51fb:2fc1%eth5, fe80:0:0:0:812a:7edf:9502:2449%eth4
> [11:14:56,534][INFO][main][IgniteKernal] Enabled local MACs:
> 00000000000000E0, 000D3A18192C, 00155D02051F
> [11:14:56,571][INFO][main][TcpDiscoverySpi] Connection check threshold is
> calculated: 2000
> [11:14:56,588][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port
> [port=47500, localHost=0.0.0.0/0.0.0.0,
> locNodeId=c77edb8f-56b8-405d-b2ba-66f585a30952]
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite .NET and TcpDiscoveryStaticIpFinder failure

Posted by camer314 <ca...@towerswatson.com>.
I dont know about that, is that maybe because I had both addresses down as
endpoints?

I have modified app.config on both machines to only point to one another:

/       <discoverySpi type="TcpDiscoverySpi" networkTimeout="30000"
joinTimeout="60000">
        <ipFinder type="TcpDiscoveryStaticIpFinder">
            <endpoints>
                <string>10.0.2.5:47500..47502</string>
            </endpoints>
        </ipFinder>
/

       <discoverySpi type="TcpDiscoverySpi" networkTimeout="60000"
joinTimeout="120000">
        <ipFinder type="TcpDiscoveryStaticIpFinder">
            <endpoints>
                <string>10.0.2.11:47500..47502</string>
            </endpoints>
        </ipFinder>
    </discoverySpi>   


Now on the #11 machine the log is a lot cleaner, it just has these messages
repeated:

[11:17:08,959][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.0.2.5, rmtPort=53509]
[11:17:08,959][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.0.2.5, rmtPort=53509]
[11:17:08,959][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509]
[11:17:08,965][INFO][tcp-disco-sock-reader-#24][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.0.2.5:53509, rmtPort=53509

But on the #5 machine the log is entirely void of any detail at all:

[11:14:56,534][INFO][main][IgniteKernal] Non-loopback local IPs: *10.0.2.5*,
10.0.75.1, fe80:0:0:0:0:5efe:a00:205%net2, fe80:0:0:0:0:5efe:a00:4b01%net7,
fe80:0:0:0:35b8:b554:51fb:2fc1%eth5, fe80:0:0:0:812a:7edf:9502:2449%eth4
[11:14:56,534][INFO][main][IgniteKernal] Enabled local MACs:
00000000000000E0, 000D3A18192C, 00155D02051F
[11:14:56,571][INFO][main][TcpDiscoverySpi] Connection check threshold is
calculated: 2000
[11:14:56,588][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port
[port=47500, localHost=0.0.0.0/0.0.0.0,
locNodeId=c77edb8f-56b8-405d-b2ba-66f585a30952]






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite .NET and TcpDiscoveryStaticIpFinder failure

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Something is not right here:
sockAddrs=[camyakoubCPU/10.0.2.5:47500,
/10.0.75.1:47500, /0:0:0:0:0:0:0:1:47500, /127.0.0.1:47500,
host.docker.internal/10.0.2.11:47500],

Why does node think that it is both 10.0.2.5 and 10.0.2.11? Aren't those
supposed to be different addresses?

Please provide full network configuration from both nodes.

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 11:32, camer314 <ca...@towerswatson.com>:

> I just realised this is the same situation that is logged here  here
> <
> http://apache-ignite-users.70518.x6.nabble.com/Unable-to-connect-to-a-remote-apache-ignite-node-tt30367.html>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite .NET and TcpDiscoveryStaticIpFinder failure

Posted by camer314 <ca...@towerswatson.com>.
I just realised this is the same situation that is logged here  here
<http://apache-ignite-users.70518.x6.nabble.com/Unable-to-connect-to-a-remote-apache-ignite-node-tt30367.html>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/