You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by John <fa...@gmail.com> on 2016/05/15 12:59:42 UTC

Ignite Deadlock

Hi.

I have 2 ignite instances that use IgniteCache to store some cache values.
The cache is configured with replication on, so both instances have the
same data.

Since I am running JNI code to get the cache values, it sometimes (on rare
occasions) crashes, which in turn kills the ignite instance. I have an
external script that starts the failed ignite instance as soon as it
crashes.

I was expecting the non crashed ignite instance (ignite1) to quickly update
the crashed instance (ignite2) and both to continue working as usual.

This was exactly what was going on for a few days, until one time, ignite2
has crashed, and ignite1 seems to get into a deadlock. As soon as ignite2
got back up, it failed to recognize ignite1, and failed to replicate from
it. Any client connections to ignite instances stopped working as well.

I am seeing this error in the log:

Failed to wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.

and also:

Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'


I am using ignite v1.4
Any suggestions or ideas will be highly appreciated.

Thanks!

Re: Ignite Deadlock

Posted by levaly <fa...@gmail.com>.

Unfortunately, I don't have the problematic instance running anymore, so I
cannot provide the logs or thread dumps.

I will wait for a week for release of version 1.6, and I hope it will
resolve this issue.
If I manage to reproduce the problem, I will add the logs.

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Deadlock-tp4944p4969.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Deadlock

Posted by Denis Magda <dm...@gridgain.com>.

Hi,

First of all, what is the reason of why you need to interact with caches using JNI? Probably we could recommend you some other approach that is simple and safer.

Second, it’s hard to tell why ignite1 gets into a deadlock without the following:
- logs from all the nodes;
- thread dumps of all the nodes.
- configuration you use.

Please this info if you need an assistance.

Finally, I would recommend switching to ignite 1.5 or wait for a week or so for ignite 1.6.

—
Denis

> On May 15, 2016, at 3:59 PM, John <fa...@gmail.com> wrote:
> 
> 
> Hi.
> 
> I have 2 ignite instances that use IgniteCache to store some cache values.
> The cache is configured with replication on, so both instances have the same data. 
> 
> Since I am running JNI code to get the cache values, it sometimes (on rare occasions) crashes, which in turn kills the ignite instance. I have an external script that starts the failed ignite instance as soon as it crashes.
> 
> I was expecting the non crashed ignite instance (ignite1) to quickly update the crashed instance (ignite2) and both to continue working as usual. 
> 
> This was exactly what was going on for a few days, until one time, ignite2 has crashed, and ignite1 seems to get into a deadlock. As soon as ignite2 got back up, it failed to recognize ignite1, and failed to replicate from it. Any client connections to ignite instances stopped working as well.
> 
> I am seeing this error in the log:
> 
> Failed to wait for initial partition map exchange. Possible reasons are: 
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> 
> and also:
> 
> Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
> 
> 
> I am using ignite v1.4
> Any suggestions or ideas will be highly appreciated.
> 
> Thanks!
>