You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "conway.ken" <co...@gmail.com> on 2016/04/26 15:40:25 UTC

Ignite Client Blocks On Ignite Server Restart

Hello, I'm having a problem with Ignite clients blocking indefinitely after
an Ignite server node is restarted. The problem occurs when running in a
cluster, while using Ignite as a simple distributed cache.  This problem
only seems to manifest itself when under load.

My configuration includes 3 Ignite server nodes, configured to run on 3
seperate physical hosts, each on the same sub net.  Each physical host also
has 3 seperate applications running on them and each of these applications
are configured as Ignite client nodes.  In effect, I have a cluster that
when fully started has a topology snapshot of [ver=14, servers=3,
clients=9, CPUs=24, heap=18.0GB].  Each server node has 4GB heap allocated.
Each client node has 512m heap allocated.

The load test is accessing the applications that are configured as Ignite
client nodes, and results in approx approximately 700 cache transactions
(350 gets and 350 puts) per second having a cache payload of approximately
2k each.

I've tried a few configuration adjustments but have not found one that
resolves my problem and I'm hoping someone might be able to point out what
is causing it.

The steps I perform to produce the problem are as follows:

   1. Start Ignite servers on host01, host02 and host03
   2. Start client applications on host01, host02 and host03
   3. Start load test application
   4. Stop Ignite server on host03
   5. Wait a few minutes and then restart Ignite server on host03

Upon restart of the Ignite server on host03, all client requests then block
indefinitely.

Please see the attached files for the configurations being used.

If anyone can help me out with what I have done incorrectly, I would really
appreciate it.

Thanks,
Ken


ignite-server-config.xml (1K) <http://apache-ignite-users.70518.x6.nabble.com/attachment/4554/0/ignite-server-config.xml>
IgniteCacheFactory.java (1K) <http://apache-ignite-users.70518.x6.nabble.com/attachment/4554/1/IgniteCacheFactory.java>
IgniteConfig.java (2K) <http://apache-ignite-users.70518.x6.nabble.com/attachment/4554/2/IgniteConfig.java>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by vkulichenko <va...@gmail.com>.
Hi Colin,

I'm not sure that you have the same issue that Ken has. Is it possible for
you to share your test with us?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4679.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by colinc <co...@yahoo.co.uk>.
In the case of the test that I am executing, high contention caused by the
test running as an Ignite client is causing not only the client but the
whole cluster to become unresponsive upon subsequent destruction of the
cache. The only way to get it to respond again seems to be to kill the
client JVM - but that depends on identifying which client is causing the
problem. Looking at the processor activity etc. doesn't seem to help because
the cluster is not busy at this point - rather it is just hung along with
everything else.

Because it's just a test that I'm running, I have the luxury of being able
to kill the old clients and then test the cache for responsiveness by
running a new client. As noted, this is a big improvement on v1.5 where the
only way to recover was to restart the cluster. I'd be interested to know if
v1.6 solved Ken's issue - which is probably a more common situation in
production than mine.

Regards,
Colin.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4646.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by vkulichenko <va...@gmail.com>.
Hi,

Usually such monitoring is done by the application code, because it depends
in its logic. What are the particular cases when you want to kill a client?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4639.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by colinc <co...@yahoo.co.uk>.
Thanks for the tip. I'm happy to report that the 1.6 version is considerably
more reliable than 1.5. Although I am still able to break it under/ high
enough levels of contention it is a lot harder to do. Also, it generally
recovers when the client is killed (as opposed to the whole cluster) -
though I'm unsure as to the best way to monitor and kill clients if this
were to happen in production.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4626.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by vkulichenko <va...@gmail.com>.
BTW, not to build manually, you can download the nightly build here:
https://builds.apache.org/view/H-L/view/Ignite/job/Ignite-nightly/lastSuccessfulBuild/

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4570.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by vkulichenko <va...@gmail.com>.
Hi Ken, Colin,

The first thing I would recommend to try is to build from master and try to
run your code with the resulting build. There were a lot of stability fixes
since 1.5 and there is a big chance that your issues are already fixed.
Obviously, all these changes will be released in 1.6.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4569.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Client Blocks On Ignite Server Restart

Posted by colinc <co...@yahoo.co.uk>.
I'm experiencing something very similar to this. In my case, I have a load
test that is causing transaction contention. I don't see the problem when
transactions are switched off, even at high load. The transactions are
cross-cache if that's relevant at all.

The contention causes (expected) errors like the one below but the cluster
continues to work as normal until (in my case) destroyCache() is called. I'm
doing this in order to test different cache configurations.

At this point, the cluster effectively stops responding. Operations from
client nodes are not serviced - even if new nodes are added to the cluster -
until all the original nodes are killed.

I have been unable to replicate the problem with a simple test - even one
that creates e.g. an OptimisticLockFailureException. It seem to require this
level of contention before the problem occurs.

Failed to execute compound future reducer: Compound future listener []class
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
Failed to acquire lock within provided timeout for transaction
[timeout=1000,
tx=org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter$1@49550672]
	at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3943)
	at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3895)
	at
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)

Regards,
Colin.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Client-Blocks-On-Ignite-Server-Restart-tp4554p4564.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.