You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by ianhamilton_modelshop <ia...@modelshop.com> on 2018/07/31 00:41:31 UTC

Can't connect client to server after client shuts down the first time

Hello,

I'm doing a POC to see if Ignite is suitable for my company's application. 
While doing this, I have created the following environment:

Configuration:

Ignite version: 2.6.0
Java version used:  Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle
Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
OS: Windows 10 (local dev env)

Server: running an Ignite server via the $IGNITE_HOME/bin/ignite.bat script.
Client: Junit session running in IntelliJ, using Ignite's Java API to attach
to the server, run in client mode, and activate the cluster once initially
connected.
Configuration: see attached zip file, file ignitepoc.xml.  Both the client
and server use the same configuration.
PartitionExchangeProblemWhenReconnecting.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1951/PartitionExchangeProblemWhenReconnecting.zip>  

What's Happening

Inititial client run - ok

1. Start server up - server starts ok
2. Run client - client is able to connect to server and run test to
completion.  Client also explicitly calls Ignite.close() to shutdown
cleanly.
During the client execution, it:
* Destroys any existing copy of the test cache from prior runs
* Creates a new test cache
* Loads 100K items into that cache using a DataStreamer
* reads all items in the cache using an Iterator obtained from the cache
* reads 100K items at random using the cache's get() method

Logs from this step are available in the attached zip file - file names
ClientLog-FirstRun-Success.txt, ServerLog-FirstRun-Success.txt

Second client run - trouble starts

The server remains up and running from the first run.
3. Run client again.

*The problem here is that the client never successfully connects to the
server.*
The server fails responding back to one of the messages sent from the
client, and I see the following exception in the logs:

/2018-07-30 18:08:05.494 [exchange-worker-#42] ERROR
o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture.error:137 - Failed to
reinitialize local partitions (preloading will be stopped): 

GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1,
172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0,
ip-172-27-225-23.ec2.internal/172.27.225.23:0,
ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
intOrder=3, lastExchangeTime=1532988478915, loc=false,
ver=2.6.0#20180710-sha1:669feacc, isClient=true], topVer=4,
nodeId8=798ca779, msg=Node joined: TcpDiscoveryNode
[id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1,
172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0,
ip-172-27-225-23.ec2.internal/172.27.225.23:0,
ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
intOrder=3, lastExchangeTime=1532988478915, loc=false,
ver=2.6.0#20180710-sha1:669feacc, isClient=true], type=NODE_JOINED,
tstamp=1532988478959], nodeId=207a9b5d, evt=NODE_JOINED] 

java.lang.NullPointerException: null at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1243)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1239)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.rebuildIndexesIfNeeded(GridCacheDatabaseSharedManager.java:1239)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1711)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:126)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:729)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_171]/

Logs from this step are available in the attached zip file - file names
ClientLog-SecondRun-ClientCannotConnect.txt,
ServerLog-SecondRun-ClientCannotConnect.txt

From stepping through the server code using a debugger, I can see that the
usrFut variable is null on GridCacheDatabaseSharedManager.java:1243.

But I have no idea whether that is the problem or if my setup should not
have even gotten into that area of the code.

I had to kill the client in order to stop it, otherwise it will continually
wait for the message to come back.

Try to run client one more time - still a problem

The server is still up and running from before, it hasn't been restarted.
4. Try running the client again.

Here again, the client hangs.  I don't seem to see the NPE like before.  But
it continually waits for a response from the server, and I have to kill it.

Logs from this step are available in the attached zip file - file names
ClientLog-ThirdRun-ClientStillCannotConnect.txt,
ServerLog-ThirdRun-ClientStillCannotConnect.txt

The client will not successfully connect to the server unless I restart the
server.  Then the pattern of events shown above repeats itself - first time
the client can connect, but subsequent times it hangs.

*Could someone please help?  Is this a bug, or have I messed up something in
the configuration?*



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Can't connect client to server after client shuts down the first time

Posted by ianhamilton_modelshop <ia...@modelshop.com>.

Thanks for the help guys.
I've since switched over to using Ignite embedded into my client vs. calling
out to a server.

For the record:
1. I don't think I reused the IgniteConfiguration java object.  My server
and my client both used the same XML configuration file, yes.  But in my
client code, I also set:

Ignition.setClientMode(true);
igniteClient.cluster().active(true);

There was only ever 1 client running in my testing.

PS - sorry for the late response.  I would've thought I would get an e-mail
when anyone replied.  Looks like I have to check my configuration...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Can't connect client to server after client shuts down the first time

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Please make sure you're not reusing IgniteConfiguration object with its SPI
objects. Some of those are not reusable, once used in a node they should
not be used to start a new one.

Regards,

-- 
Ilya Kasnacheev

2018-07-31 3:41 GMT+03:00 ianhamilton_modelshop <ia...@modelshop.com>
:

> Hello,
>
> I'm doing a POC to see if Ignite is suitable for my company's application.
> While doing this, I have created the following environment:
>
> Configuration:
>
> Ignite version: 2.6.0
> Java version used:  Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle
> Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
> OS: Windows 10 (local dev env)
>
> Server: running an Ignite server via the $IGNITE_HOME/bin/ignite.bat
> script.
> Client: Junit session running in IntelliJ, using Ignite's Java API to
> attach
> to the server, run in client mode, and activate the cluster once initially
> connected.
> Configuration: see attached zip file, file ignitepoc.xml.  Both the client
> and server use the same configuration.
> PartitionExchangeProblemWhenReconnecting.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/t1951/
> PartitionExchangeProblemWhenReconnecting.zip>
>
> What's Happening
>
> Inititial client run - ok
>
> 1. Start server up - server starts ok
> 2. Run client - client is able to connect to server and run test to
> completion.  Client also explicitly calls Ignite.close() to shutdown
> cleanly.
> During the client execution, it:
> * Destroys any existing copy of the test cache from prior runs
> * Creates a new test cache
> * Loads 100K items into that cache using a DataStreamer
> * reads all items in the cache using an Iterator obtained from the cache
> * reads 100K items at random using the cache's get() method
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-FirstRun-Success.txt, ServerLog-FirstRun-Success.txt
>
> Second client run - trouble starts
>
> The server remains up and running from the first run.
> 3. Run client again.
>
> *The problem here is that the client never successfully connects to the
> server.*
> The server fails responding back to one of the messages sent from the
> client, and I see the following exception in the logs:
>
> /2018-07-30 18:08:05.494 [exchange-worker-#42] ERROR
> o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture.error:137 - Failed to
> reinitialize local partitions (preloading will be stopped):
>
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4,
> minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1,
> 127.0.0.1,
> 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0
> ,
> ip-172-27-225-23.ec2.internal/172.27.225.23:0,
> ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
> intOrder=3, lastExchangeTime=1532988478915, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=true], topVer=4,
> nodeId8=798ca779, msg=Node joined: TcpDiscoveryNode
> [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1,
> 127.0.0.1,
> 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0
> ,
> ip-172-27-225-23.ec2.internal/172.27.225.23:0,
> ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
> intOrder=3, lastExchangeTime=1532988478915, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=true], type=NODE_JOINED,
> tstamp=1532988478959], nodeId=207a9b5d, evt=NODE_JOINED]
>
> java.lang.NullPointerException: null at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager
> .java:1243)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager
> .java:1239)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(
> GridFutureAdapter.java:383)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> listen(GridFutureAdapter.java:353)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.rebuildIndexesIfNeeded(
> GridCacheDatabaseSharedManager.java:1239)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFutur
> e.java:1711)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFutur
> e.java:126)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> onDone(GridFutureAdapter.java:451)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFutur
> e.java:729)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
> [ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
> [ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> [ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:748)
> [na:1.8.0_171]/
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-SecondRun-ClientCannotConnect.txt,
> ServerLog-SecondRun-ClientCannotConnect.txt
>
> From stepping through the server code using a debugger, I can see that the
> usrFut variable is null on GridCacheDatabaseSharedManager.java:1243.
>
> But I have no idea whether that is the problem or if my setup should not
> have even gotten into that area of the code.
>
> I had to kill the client in order to stop it, otherwise it will continually
> wait for the message to come back.
>
> Try to run client one more time - still a problem
>
> The server is still up and running from before, it hasn't been restarted.
> 4. Try running the client again.
>
> Here again, the client hangs.  I don't seem to see the NPE like before.
> But
> it continually waits for a response from the server, and I have to kill it.
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-ThirdRun-ClientStillCannotConnect.txt,
> ServerLog-ThirdRun-ClientStillCannotConnect.txt
>
> The client will not successfully connect to the server unless I restart the
> server.  Then the pattern of events shown above repeats itself - first time
> the client can connect, but subsequent times it hangs.
>
> *Could someone please help?  Is this a bug, or have I messed up something
> in
> the configuration?*
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Can't connect client to server after client shuts down the first time

Posted by Evgenii Zhuravlev <e....@gmail.com>.

From the log I see that something hanged even after the first client start,
that's why new client wasn't available to join a cluster. COuld you share a
reproducer ? I mean java code.

Evgenii

2018-07-31 3:41 GMT+03:00 ianhamilton_modelshop <ia...@modelshop.com>
:

> Hello,
>
> I'm doing a POC to see if Ignite is suitable for my company's application.
> While doing this, I have created the following environment:
>
> Configuration:
>
> Ignite version: 2.6.0
> Java version used:  Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle
> Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
> OS: Windows 10 (local dev env)
>
> Server: running an Ignite server via the $IGNITE_HOME/bin/ignite.bat
> script.
> Client: Junit session running in IntelliJ, using Ignite's Java API to
> attach
> to the server, run in client mode, and activate the cluster once initially
> connected.
> Configuration: see attached zip file, file ignitepoc.xml.  Both the client
> and server use the same configuration.
> PartitionExchangeProblemWhenReconnecting.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/t1951/
> PartitionExchangeProblemWhenReconnecting.zip>
>
> What's Happening
>
> Inititial client run - ok
>
> 1. Start server up - server starts ok
> 2. Run client - client is able to connect to server and run test to
> completion.  Client also explicitly calls Ignite.close() to shutdown
> cleanly.
> During the client execution, it:
> * Destroys any existing copy of the test cache from prior runs
> * Creates a new test cache
> * Loads 100K items into that cache using a DataStreamer
> * reads all items in the cache using an Iterator obtained from the cache
> * reads 100K items at random using the cache's get() method
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-FirstRun-Success.txt, ServerLog-FirstRun-Success.txt
>
> Second client run - trouble starts
>
> The server remains up and running from the first run.
> 3. Run client again.
>
> *The problem here is that the client never successfully connects to the
> server.*
> The server fails responding back to one of the messages sent from the
> client, and I see the following exception in the logs:
>
> /2018-07-30 18:08:05.494 [exchange-worker-#42] ERROR
> o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture.error:137 - Failed to
> reinitialize local partitions (preloading will be stopped):
>
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4,
> minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1,
> 127.0.0.1,
> 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0
> ,
> ip-172-27-225-23.ec2.internal/172.27.225.23:0,
> ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
> intOrder=3, lastExchangeTime=1532988478915, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=true], topVer=4,
> nodeId8=798ca779, msg=Node joined: TcpDiscoveryNode
> [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1,
> 127.0.0.1,
> 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0
> ,
> ip-172-27-225-23.ec2.internal/172.27.225.23:0,
> ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
> intOrder=3, lastExchangeTime=1532988478915, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=true], type=NODE_JOINED,
> tstamp=1532988478959], nodeId=207a9b5d, evt=NODE_JOINED]
>
> java.lang.NullPointerException: null at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager
> .java:1243)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager
> .java:1239)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(
> GridFutureAdapter.java:383)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> listen(GridFutureAdapter.java:353)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.rebuildIndexesIfNeeded(
> GridCacheDatabaseSharedManager.java:1239)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFutur
> e.java:1711)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFutur
> e.java:126)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> onDone(GridFutureAdapter.java:451)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFutur
> e.java:729)
> ~[ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
> [ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
> [ignite-core-2.6.0.jar:2.6.0] at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> [ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:748)
> [na:1.8.0_171]/
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-SecondRun-ClientCannotConnect.txt,
> ServerLog-SecondRun-ClientCannotConnect.txt
>
> From stepping through the server code using a debugger, I can see that the
> usrFut variable is null on GridCacheDatabaseSharedManager.java:1243.
>
> But I have no idea whether that is the problem or if my setup should not
> have even gotten into that area of the code.
>
> I had to kill the client in order to stop it, otherwise it will continually
> wait for the message to come back.
>
> Try to run client one more time - still a problem
>
> The server is still up and running from before, it hasn't been restarted.
> 4. Try running the client again.
>
> Here again, the client hangs.  I don't seem to see the NPE like before.
> But
> it continually waits for a response from the server, and I have to kill it.
>
> Logs from this step are available in the attached zip file - file names
> ClientLog-ThirdRun-ClientStillCannotConnect.txt,
> ServerLog-ThirdRun-ClientStillCannotConnect.txt
>
> The client will not successfully connect to the server unless I restart the
> server.  Then the pattern of events shown above repeats itself - first time
> the client can connect, but subsequent times it hangs.
>
> *Could someone please help?  Is this a bug, or have I messed up something
> in
> the configuration?*
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>