You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by breischl <br...@gmail.com> on 2018/06/21 16:35:53 UTC

Deadlock during cache loading

We've run into a problem recently where it appears our cache is deadlocking
during loading. What I mean by "loading" is that we start up a new cluster
in AWS, unconnected to any existing cluster, and then shove a bunch of data
into it from Kafka. During this process it's not taking any significant
traffic - just healthchecks, ingesting data, and me clicking around in it. 

We've had several deployments in a row fail, apparently due to deadlocking
in the loading process. We're typically seeing a number of threads blocked
with stacktraces like this:

"data-streamer-stripe-3-#20" id=124 state=WAITING
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.invoke(GridDhtAtomicCache.java:786)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1359)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1405)
    at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1362)
    at
com.mycompany.myapp.myPackage.dao.ignite.cache.streamer.VersionCheckingStreamReceiver.receive(VersionCheckingStreamReceiver.java:33)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:137)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:397)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:302)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:59)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:89)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
    at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
    at
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505)
    at java.lang.Thread.run(Thread.java:748)


The machines seem to go into a moderate-CPU loop (~70% usage). My best guess
is all that is going to threads like this:

"exchange-worker-#62" id=177 state=RUNNABLE
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.toString(SBLimitedLength.java:283)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1012)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.toString(GridDhtAtomicAbstractUpdateFuture.java:588)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicSingleUpdateFuture.toString(GridDhtAtomicSingleUpdateFuture.java:134)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:685)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:621)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.toString(GridDhtPartitionsExchangeFuture.java:3555)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpDebugInfo(GridCachePartitionExchangeManager.java:1569)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2359)
    at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    at java.lang.Thread.run(Thread.java:748)



I've seen elsewhere that putAll()/getAll() could cause deadlocks, but we're
not using those. I don't believe slow network is the problem. What else can
I look at or try to resolve this? Are we just throwing data into the caches
too fast? Could a weird pattern in the data (eg, large entities) cause this? 

I've attached a full thread dump in case that helps.

Thanks in advance,
BKR

IgniteStackTrace_redacted.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1824/IgniteStackTrace_redacted.txt>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by smovva <su...@sturfee.com>.

I have a fairly similar setup. What type of EC2 instances are you using? Just
for compare my setup.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by David Harvey <dh...@jobcase.com>.

transactions are easy to use: see  examples,  org.apache.ignite.
examples.datagrid.store.auto
We use them in the stream receiver.    You simply bracket the get/put in
the transaction, but use a timeout, then bracket that with an "until done"
while loop, perhaps added a sleep to backoff.
We ended up with better performance with PESSIMISTIC transactions, though
we expected OPTIMISTIC to win.

My guess would be the DataStreamer is not a fundamental contributor to the
deadlock you are seeing, and you may have discovered an ignite bug.

On Sun, Jul 1, 2018 at 11:44 AM, breischl <br...@gmail.com> wrote:

> @DaveHarvey, I'll look at that tomorrow. Seems potentially complicated, but
> if that's what has to happen we'll figure it out.
>
> Interestingly, cutting the cluster to half as many nodes (by reducing the
> number of backups) seems to have resolved the issue. Is there a guideline
> for how large a cluster should be?
>
> We were running a single 44-node cluster, with 3 data backups (4 total
> copies) and hitting the issue consistently. I switched to running two
> separate clusters, each with 22 nodes using 1 data backup (2 total copies).
> The smaller clusters seem to work perfectly every time, though I haven't
> tried them as much.
>
>
> @smovva - We're still actively experimenting with instance and cluster
> sizing. We were running on c4.4xl instances. However we were barely using
> the CPUs, but consistently have memory issues (using a 20GB heap, plus a
> bit
> of off-heap). We just switched to r4.2xl instances which is working better
> for us so far, and is a bit cheaper. However I would imagine that the
> optimal size depends on your use case - it's basically a tradeoff between
> the memory, CPU, networking and operational cost requirements of your use
> case.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

@DaveHarvey, I'll look at that tomorrow. Seems potentially complicated, but
if that's what has to happen we'll figure it out. 

Interestingly, cutting the cluster to half as many nodes (by reducing the
number of backups) seems to have resolved the issue. Is there a guideline
for how large a cluster should be? 

We were running a single 44-node cluster, with 3 data backups (4 total
copies) and hitting the issue consistently. I switched to running two
separate clusters, each with 22 nodes using 1 data backup (2 total copies).
The smaller clusters seem to work perfectly every time, though I haven't
tried them as much.


@smovva - We're still actively experimenting with instance and cluster
sizing. We were running on c4.4xl instances. However we were barely using
the CPUs, but consistently have memory issues (using a 20GB heap, plus a bit
of off-heap). We just switched to r4.2xl instances which is working better
for us so far, and is a bit cheaper. However I would imagine that the
optimal size depends on your use case - it's basically a tradeoff between
the memory, CPU, networking and operational cost requirements of your use
case. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Also, I probably should have mentioned this earlier but we're not using WAL
or any disk persistence. So everything should be in-memory, and generally
on-heap. I think that makes it less likely that we were blocked on plain
throughput of some hardware or virtual-hardware. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

(OT: Sorry about the duplicate posts, for some reason Nabble was refusing to
show me new posts so I thought my earlier ones had been lost.)

>Why did you decide, that cluster is deadlocked in the first place?

Because all of the Datastreamer threads were stuck waiting on locks, and no
progress was being made on loading the cache. We have various logging and
metrics around progress that were all zero, and all the threads trying to
load data were blocked waiting to insert more data. This persisted for an
hour or more with no change. 

>What did you see in logs of the failing nodes?
Nothing that jumped out at me as a smoking gun or even really related,
although I don't have the logs handy anymore as they've aged off our Splunk
servers. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by David Harvey <dh...@jobcase.com>.

Denis does have a point.   When we were trying to run using GP2 storage,
the cluster would simply lock up for an hour.   Once we moved to local SSDs
on i3 instances those issues went away (but we needed 2.5 to have the
streaming rate hold for up as we had a lot of data loaded).   The i3
instances are rated at about 700,000 write IOPS, and we were only getting
about 20-30,000 out of GP2.   You could separate or combine the WAL and
storage, and hardly move the needle.
Will describe cluster snapshots on AWS in more detail when we have
completed that work.

On Mon, Jul 2, 2018 at 11:20 AM, Denis Mekhanikov <dm...@gmail.com>
wrote:

> Why did you decide, that cluster is deadlocked in the first place?
>
> > We've had several deployments in a row fail, apparently due to
> deadlocking in the loading process.
> What did you see in logs of the failing nodes?
>
> Denis
>
> пн, 2 июл. 2018 г. в 17:08, breischl <br...@gmail.com>:
>
>> Ah, I had not thought of that, thanks.
>>
>> Interestingly, going to a smaller cluster seems to have worked around the
>> problem. We were running a 44-node cluster using 3 backups of the data.
>> Switching to two separate 22-node clusters, each with 1 backup, seems to
>> work just fine. Is there some limit to how large a cluster should be?
>>
>> @smovva - We were using c4.4xl instances, but switched to r4.2xl because
>> we
>> had spare CPU but kept having memory problems. I suspect that there isn't
>> a
>> "right" size to use, it just depends on the use case you have.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Deadlock during cache loading

Posted by Denis Mekhanikov <dm...@gmail.com>.

Why did you decide, that cluster is deadlocked in the first place?

> We've had several deployments in a row fail, apparently due to
deadlocking in the loading process.
What did you see in logs of the failing nodes?

Denis

пн, 2 июл. 2018 г. в 17:08, breischl <br...@gmail.com>:

> Ah, I had not thought of that, thanks.
>
> Interestingly, going to a smaller cluster seems to have worked around the
> problem. We were running a 44-node cluster using 3 backups of the data.
> Switching to two separate 22-node clusters, each with 1 backup, seems to
> work just fine. Is there some limit to how large a cluster should be?
>
> @smovva - We were using c4.4xl instances, but switched to r4.2xl because we
> had spare CPU but kept having memory problems. I suspect that there isn't a
> "right" size to use, it just depends on the use case you have.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Ah, I had not thought of that, thanks. 

Interestingly, going to a smaller cluster seems to have worked around the
problem. We were running a 44-node cluster using 3 backups of the data.
Switching to two separate 22-node clusters, each with 1 backup, seems to
work just fine. Is there some limit to how large a cluster should be? 

@smovva - We were using c4.4xl instances, but switched to r4.2xl because we
had spare CPU but kept having memory problems. I suspect that there isn't a
"right" size to use, it just depends on the use case you have. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by David Harvey <dh...@jobcase.com>.

You can start a transaction in the stream receiver to make it atomic.

On Fri, Jun 29, 2018, 1:02 PM breischl <br...@gmail.com> wrote:

> StreamTransformer does an invoke() pretty much exactly like what I'm doing,
> so that would not seem to change anything.
>
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53
>
>
>
> I may try using a put(), but since I need to compare the existing cache
> value, I'd need to get(), compare, then put(). I thought that may open up a
> potential race condition, if two different updates happen close to each
> other they could each do get(), then each do put(), but potentially in the
> wrong order. Unless there's some locking I don't understand there?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

StreamTransformer does an invoke() pretty much exactly like what I'm doing,
so that would not seem to change anything. 

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53



I may try using a put(), but since I need to compare the existing cache
value, I'd need to get(), compare, then put(). I thought that may open up a
potential race condition, if two different updates happen close to each
other they could each do get(), then each do put(), but potentially in the
wrong order. Unless there's some locking I don't understand there? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by Denis Mekhanikov <dm...@gmail.com>.

Entries, that are provided to the *receive()* method are immutable.
But you can either do *cache.put() *inside the *receive() *method, just
like *DataStreamerCacheUpdaters#Individual
<https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/datastreamer/DataStreamerCacheUpdaters.java#L102>
*does,
or use *StreamTransformer *just like in StreamTransformerExample
<https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/streaming/StreamTransformerExample.java>

Denis

пт, 29 июн. 2018 г. в 19:17, breischl <br...@gmail.com>:

> Hi Denis,
>   It was not clear to me that we could do the update from within the
> StreamReceiver without some sort of cache operation. Would we just use the
> CacheEntry.setValue() method to do that? Something roughly like the
> following?
>
> Thanks!
>
>
> public void receive(IgniteCache<TKey, TEntity> cache,
> Collection<Map.Entry&lt;TKey, TEntity>> newEntries) throws IgniteException
> {
>
>         for (val newEntry : newEntries) {
>             //Do our custom logic
>             newEntry.setValue(someNewValue);
>         }
> }
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Hi Denis,
  It was not clear to me that we could do the update from within the
StreamReceiver without some sort of cache operation. Would we just use the
CacheEntry.setValue() method to do that? Something roughly like the
following?

Thanks!


public void receive(IgniteCache<TKey, TEntity> cache,
Collection<Map.Entry&lt;TKey, TEntity>> newEntries) throws IgniteException {

        for (val newEntry : newEntries) {
            //Do our custom logic
            newEntry.setValue(someNewValue);
        }
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Hi Denis,
  It was not clear to me that we could do the update from within the
StreamReceiver without some sort of cache operation. Would we just use the
CacheEntry.setValue() method to do that? Something roughly like the
following?

Thanks!


public void receive(IgniteCache<TKey, TEntity> cache,
Collection<Map.Entry&lt;TKey, TEntity>> newEntries) throws IgniteException {

        for (val newEntry : newEntries) {
            //Do our custom logic
            newEntry.setValue(someNewValue);
        }
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by Denis Mekhanikov <dm...@gmail.com>.

Hi!

Why do you do this inside an invoke()?
All of this can be done just inside a receiver.
Can you get rid of the invoke and check, that deadlocks disappear?

Denis

пт, 29 июн. 2018 г. в 17:24, breischl <br...@gmail.com>:

> That does seem to be what's happening, but we're only invoke()'ing on keys
> that were passed into receive(), so that should not require going off-box.
> Right?
>
> Here's the relevant code...
>
>
> @Override
> public void receive(IgniteCache<TKey, TEntity> cache,
> Collection<Map.Entry&lt;TKey, TEntity>> newEntries) throws IgniteException
> {
>
>     for (val newEntry : newEntries) {
>         val entryKey = newEntry.getKey();
>
>         cache.invoke(entryKey, ((CacheEntryProcessor<TKey, TEntity,
> Object>)
> (entry, args) -> {
>             val key =
>                 (TKey) args[0]; //passed this in to make the lambda
> non-capturing, which is a slight perf optimization (fewer memory allocs)
>             val newVal = (TEntity) args[1];
>             val oldVal = entry.getValue();
>
>             if (oldVal == null) {
>                 //Didn't already exist, we can just set the new values and
> be done
>                 log.info("event=receiverCreatingNewInstance key={}
> newValue={}", key, newVal);
>                 entry.setValue(newVal);
>             } else if (isNewer(oldVal, newVal)) {
>                 log.info("event=newEntryHasHigherVersion key={}
> oldVersion={} newVersion={}", key, getVersionForLogging(oldVal),
>                     getVersionForLogging(newVal));
>                 entry.setValue(newVal);
>             } else {
>                 log.info("event=newEntryHasLowerVersion key={}
> oldVersion={}
> newVersion={}", key, getVersionForLogging(oldVal),
>                     getVersionForLogging(newVal));
>             }
>             return null;
>         }), entryKey, newEntry.getValue());
>     }
> }
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

That does seem to be what's happening, but we're only invoke()'ing on keys
that were passed into receive(), so that should not require going off-box.
Right? 

Here's the relevant code...


@Override
public void receive(IgniteCache<TKey, TEntity> cache,
Collection<Map.Entry&lt;TKey, TEntity>> newEntries) throws IgniteException {

    for (val newEntry : newEntries) {
        val entryKey = newEntry.getKey();

        cache.invoke(entryKey, ((CacheEntryProcessor<TKey, TEntity, Object>)
(entry, args) -> {
            val key =
                (TKey) args[0]; //passed this in to make the lambda
non-capturing, which is a slight perf optimization (fewer memory allocs)
            val newVal = (TEntity) args[1];
            val oldVal = entry.getValue();

            if (oldVal == null) {
                //Didn't already exist, we can just set the new values and
be done
                log.info("event=receiverCreatingNewInstance key={}
newValue={}", key, newVal);
                entry.setValue(newVal);
            } else if (isNewer(oldVal, newVal)) {
                log.info("event=newEntryHasHigherVersion key={}
oldVersion={} newVersion={}", key, getVersionForLogging(oldVal),
                    getVersionForLogging(newVal));
                entry.setValue(newVal);
            } else {
                log.info("event=newEntryHasLowerVersion key={} oldVersion={}
newVersion={}", key, getVersionForLogging(oldVal),
                    getVersionForLogging(newVal));
            }
            return null;
        }), entryKey, newEntry.getValue());
    }
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by Dave Harvey <dh...@jobcase.com>.

Your original stack trace shows a call to your custom stream receiver which
appears to itself call invoke().   I can only guess that your code does, but
it appears to be making an call off node to something that is not returning.

org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1362)

    at

*com.mycompany.myapp.myPackage.dao.ignite.cache.streamer.VersionCheckingStreamReceiver.receive(VersionCheckingStreamReceiver.java:33)*

    at

org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:137)

    at



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

>our a stream receiver called invoke() and that in turn did another invoke,
which was the actual bug. 
So Ignite's invoke() implementation called itself?


>It was helpful when we did the invoke using a custom thread pool,
I'm not sure I understand the concept here. Is the idea to have an
ExecutorService in of the StreamReceiver, and use that to call invoke()?


It seems odd that it could get hung up on the get() call since presumably
this is being invoked on the primary, and therefore should not need to go
out anywhere. But I notice that it's still trying to map onto the grid
topology, so maybe if the topology changes before the StreamReceiver is
invoked? Total guess, I have only a vague idea of Ignite internals. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Also...

>What you showed that the stream receiver called invoke() and did not get an
answer, not a deadlock. 

It's not that I'm getting back a null, it's that all the threads are blocked
waiting on the invoke() call, and no progress is being made. That sounds a
lot like a deadlock. I guess you could say the problem is that it's just
never returning, but that seems like a distinction without a difference. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by Dave Harvey <dh...@jobcase.com>.

2.4 should be OK.
What you showed that the stream receiver called invoke() and did not get an
answer, not a deadlock.  Nothing looks particularly wrong there.  When we
created this bug, it was our a stream receiver called invoke() and that in
turn did another invoke, which was the actual bug.

It was helpful when we did the invoke using a custom thread pool, because
the logging reports thread in the custom pool, we could see which node had
active custom threads easily, and then look at what that thread was waiting
for.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Thanks Dave. I am using Ignite v2.4.0. Would a newer version potentially
help?

This problem seems to come and go. I didn't hit it for a few days, and now
we've hit it on two deployments in a row. It may be some sort of timing or
external factor that provokes it. The most recent case we hit the same
deadlock in the DataStreamers, but do not have the moderate-CPU behavior or
threads stuck in the StringBuilder code. So it seems like that was another
side effect. 

Any other ideas of things to investigate or try would be fantastic.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by Dave Harvey <dh...@jobcase.com>.

"When receiver is invoked for key K, it’s holding the lock for K."  is not
correct, at least in the 2.4 code.

When a custom stream receiver is called, the data streamer thread has a
read-lock preventing termination, and there is a real-lock on the topology,
but DataStreamerUpdateJob.call() does not get any per entry locks.

Since the DataStreamer threads are in a separate pool, a custom stream
receiver should be able to make any calls that a client can w/o fear of
deadlock.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

In our case we're only using the receiver as you describe, to update the key
that it was invoked for. Our actual use case is that the incoming stream of
data sometimes sends us old data, which we want to discard rather than
cache. So the StreamReceiver examines the value already in the cache and
either applies the update or discards it. I re-examined our code, and it
only operates on keys that were supplied in the second arg to the receive()
function (the Collection<Entry&lt;K, V>> of updated entries).

From your description it seems like this should be safe, and yet it seems to
be hitting a deadlock somewhere...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by Stanislav Lukyanov <st...@gmail.com>.

Well, that’s diving a bit deeper than the “don’t do cache operations” rule of thumb, but let’s do that.

When receiver is invoked for key K, it’s holding the lock for K.
It is safe to do invoke on that K (especially if you control the invoked code) since it is locked already.
But it is not safe to call invoke on another key J – because someone holding the lock for J might be doing the same for K, leading to a deadlock.

I believe it is really awkward to micromanage these locks, so the best practice is to avoid starting any cache operations (or, more general, any locking operations – including put()/get()) from the system pool threads, i.e. when executing things like StreamReceiver, Coninious Query listener, invoke() closure, etc – basically anything that is intercepting a cache operation.

Thanks,
Stan

From: breischl
Sent: 22 июня 2018 г. 18:09
To: user@ignite.apache.org
Subject: RE: Deadlock during cache loading

Hi Stan,
  Thanks for taking a look. I'm having trouble finding anywhere that it's
documented what I can or can't call inside a receiver. Is it just
put()/get() that are allowed? 

  Also, I noticed that the default StreamTransformer implementation calls
invoke() from within a receiver. So is that broken/deadlock-prone as well?

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53

Thanks!
BKR



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Hi Stan,
  Thanks for taking a look. I'm having trouble finding anywhere that it's
documented what I can or can't call inside a receiver. Is it just
put()/get() that are allowed? 

  Also, I noticed that the default StreamTransformer implementation calls
invoke() from within a receiver. So is that broken/deadlock-prone as well?

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53

Thanks!
BKR



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Deadlock during cache loading

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi,

Looks like you’re performing a cache operation (invoke()) from a StreamReceiver – this is not allowed.
Check out this SO answer https://stackoverflow.com/questions/43891757/closures-stuck-in-2-0-when-try-to-add-an-element-into-the-queue.

Stan

From: breischl
Sent: 21 июня 2018 г. 19:35
To: user@ignite.apache.org
Subject: Deadlock during cache loading

We've run into a problem recently where it appears our cache is deadlocking
during loading. What I mean by "loading" is that we start up a new cluster
in AWS, unconnected to any existing cluster, and then shove a bunch of data
into it from Kafka. During this process it's not taking any significant
traffic - just healthchecks, ingesting data, and me clicking around in it. 

We've had several deployments in a row fail, apparently due to deadlocking
in the loading process. We're typically seeing a number of threads blocked
with stacktraces like this:

"data-streamer-stripe-3-#20" id=124 state=WAITING
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.invoke(GridDhtAtomicCache.java:786)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1359)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1405)
    at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1362)
    at
com.mycompany.myapp.myPackage.dao.ignite.cache.streamer.VersionCheckingStreamReceiver.receive(VersionCheckingStreamReceiver.java:33)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:137)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:397)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:302)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:59)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:89)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
    at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
    at
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505)
    at java.lang.Thread.run(Thread.java:748)


The machines seem to go into a moderate-CPU loop (~70% usage). My best guess
is all that is going to threads like this:

"exchange-worker-#62" id=177 state=RUNNABLE
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.toString(SBLimitedLength.java:283)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1012)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.toString(GridDhtAtomicAbstractUpdateFuture.java:588)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicSingleUpdateFuture.toString(GridDhtAtomicSingleUpdateFuture.java:134)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:685)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:621)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.toString(GridDhtPartitionsExchangeFuture.java:3555)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpDebugInfo(GridCachePartitionExchangeManager.java:1569)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2359)
    at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    at java.lang.Thread.run(Thread.java:748)



I've seen elsewhere that putAll()/getAll() could cause deadlocks, but we're
not using those. I don't believe slow network is the problem. What else can
I look at or try to resolve this? Are we just throwing data into the caches
too fast? Could a weird pattern in the data (eg, large entities) cause this? 

I've attached a full thread dump in case that helps.

Thanks in advance,
BKR

IgniteStackTrace_redacted.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1824/IgniteStackTrace_redacted.txt>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock during cache loading

Posted by breischl <br...@gmail.com>.

Just found a bunch of these in my logs as well. Note this is showing
starvation in the system threadpool, not the datastreamer threadpool, but
perhaps they're related?

[2018-06-28T17:39:55,728Z](grid-timeout-worker-#23)([]) WARN - G - >>>
Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5
    Queue:
[o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@833da92,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@33e4268f,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@69c904f6,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@5b9aa1b6,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@4deb071d,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@5fa99071,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@7e66c1c6,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@707f48ad,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@65396a50,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@7600549e,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@3e20c369,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@3410de20,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@1ad55918,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@6e054a78,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@5606e75a,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@6455c264,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@54784a6f,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@10ea9c12,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=468,
val=null, hasValBytes=true], val=CacheObjectImpl [val=null,
hasValBytes=true], prevVal=null, super=GridDhtAtomicAbstractUpdateRequest
[onRes=false, nearNodeId=null, nearFutId=0, flags=]]]], Message closure
[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
timeout=0, skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=468, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=468, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=468, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=492, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=132, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=132, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=132, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=132, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest
[key=KeyCacheObjectImpl [part=132, val=null, hasValBytes=true],
val=CacheObjectImpl [val=null, hasValBytes=true], prevVal=null,
super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null,
nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest [key=K
    Deadlock: false
    Completed: 1010831
Thread [name="sys-stripe-4-#5", id=101, state=WAITING, blockCnt=596,
waitCnt=150343]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
        at
o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
        at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
        at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
        at
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
        at
o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1642)
        at
o.a.i.i.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
        at
o.a.i.i.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
        at
o.a.i.i.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1205)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.sendDeferredUpdateResponse(GridDhtAtomicCache.java:3375)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$3300(GridDhtAtomicCache.java:130)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout.run(GridDhtAtomicCache.java:3612)
        at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:505)
        at java.lang.Thread.run(Thread.java:748)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/