You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Humphrey <hm...@gmail.com> on 2019/01/23 13:34:01 UTC

Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Hello everyone,

I'm getting the error below when running more than 1 ServerNode.

The idea what we want to achive is the following:

1) A client node will be adding data (ALPHA) to a partitioned cache
(CACHE_ALPHA).
2) In the cluster (server nodes) we have a Node-Singleton Service deployed,
which has a continuous query to handle the CREATED events of the data added
from the client on local cache (cache.setLocal(true)).
3) For each ALPHA event we should generate one (or more) BRAVO data and add
them to the cache(CACHE_BRAVO), this is done by a compute task in the event
handler.

This seems to work fine until we start a second ServerNode. What are we
doing wrong here? We would like to process the events generated by the
CACHE_ALPHA with compute tasks. Eventually we would like to have another
Service as well for handling events of CACHE_BRAVO, but we already facing
problems handling events of the continuous query of one cache on multiple
server nodes.

I have a reproducer attached here.

striped-pool-starvation.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1004/striped-pool-starvation.zip>  

Humphrey






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Humphrey <hm...@gmail.com>.

Thread [name="sys-stripe-1-#2", id=17, state=WAITING, blockCnt=8, waitCnt=8]
    Lock [object=java.util.concurrent.Semaphore$NonfairSync@6915e1ad,
ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
        at
o.a.i.i.processors.cache.GridCacheAdapter.asyncOpAcquire(GridCacheAdapter.java:4517)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:756)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1144)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.putAsync0(GridDhtAtomicCache.java:641)
        at
o.a.i.i.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2828)
        at
o.a.i.i.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2809)
        at
o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAsync0(IgniteCacheProxyImpl.java:1125)
        at
o.a.i.i.processors.cache.IgniteCacheProxyImpl.putAsync(IgniteCacheProxyImpl.java:1114)
        at
o.a.i.i.processors.cache.GatewayProtectedCacheProxy.putAsync(GatewayProtectedCacheProxy.java:832)
        at
nl.project.training.ignite.service.ServiceImpl.lambda$genererate$1(ServiceImpl.java:56)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

1) I think it's Public thread. I think your solution should be OK.
2) Right. When you would listen on this future? I hope it isn't in Event
listener :)

Regards,
-- 
Ilya Kasnacheev


пн, 28 янв. 2019 г. в 17:15, Humphrey <hm...@gmail.com>:

> Hi Ilya,
>
> 1) Which thread pool is used by compute? (is that the ignite public thread
> pool [1])?
>
> I'm now using the following from when I listen to events:
>
> CompletableFuture.runAsync(() -> {
>       ignite.compute().run(new MyRunnable(event.getValue()))
> }, Executors.newFixedThreadPool(10));
>
> This seems to work now but I'm not sure if this is the correct way to
> handle
> the long running events.
> 2) I think this will will queue all those jobs until a thread (one of the
> 10) finishes it's job right?
>
> I've also tried with a compute.runAsync and then listen on the future,
> after
> doing the put in the callback method.
> 3) Which of these is the best approach?
>
> Humphrey
>
>
> [1] https://apacheignite.readme.io/docs/thread-pools
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Humphrey <hm...@gmail.com>.

Hi Ilya,

1) Which thread pool is used by compute? (is that the ignite public thread
pool [1])?

I'm now using the following from when I listen to events:

CompletableFuture.runAsync(() -> { 
      ignite.compute().run(new MyRunnable(event.getValue())) 
}, Executors.newFixedThreadPool(10));

This seems to work now but I'm not sure if this is the correct way to handle
the long running events. 
2) I think this will will queue all those jobs until a thread (one of the
10) finishes it's job right?

I've also tried with a compute.runAsync and then listen on the future, after
doing the put in the callback method.
3) Which of these is the best approach?

Humphrey


[1] https://apacheignite.readme.io/docs/thread-pools




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

It may work due to a pure chance and then eventually lock.

Actually putAsync will execute synchronously if executed on data node (for
a given key). It is recommended to avoid any cache operations from event
listeners. In Event Listener you can put data to queue, and dequeue it in a
separate threads, running put() from there. Can you try this approach, see
if it helps?

Is your last lockup still related to cache events?

Regards,
-- 
Ilya Kasnacheev


чт, 24 янв. 2019 г. в 17:37, Humphrey <hm...@gmail.com>:

> And can you also clarify which thread pool is used for cache.put() /
> cache.putAsync ?
>
> I'm getting a lock whit two nodes when putting data into the cache with
> map.forEach((key, value) -> cache.putAsync(key, value));
> I could also try putAllAsync() but don't know if that is better than
> putAsync.
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Humphrey <hm...@gmail.com>.

And can you also clarify which thread pool is used for cache.put() /
cache.putAsync ?

I'm getting a lock whit two nodes when putting data into the cache with
map.forEach((key, value) -> cache.putAsync(key, value));
I could also try putAllAsync() but don't know if that is better than
putAsync.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Humphrey <hm...@gmail.com>.

Thanks Ilya,

It works with runAsync.
 
(Question) Can you clarify why does it work on a single node well and when
going on two nodes it doesn't, and we get the exceptions? I was expecting it
also to be happening on one server node.

Humphrey



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

As a general principle you should avoid doing any blocking operations from
event handlers, which is precisely something that you are doing.

If you replace run() with runAsync() in your service impl, it will finish
all right with two ServerNode's.

Regards,
-- 
Ilya Kasnacheev


ср, 23 янв. 2019 г. в 16:34, Humphrey <hm...@gmail.com>:

> Hello everyone,
>
> I'm getting the error below when running more than 1 ServerNode.
>
> The idea what we want to achive is the following:
>
> 1) A client node will be adding data (ALPHA) to a partitioned cache
> (CACHE_ALPHA).
> 2) In the cluster (server nodes) we have a Node-Singleton Service deployed,
> which has a continuous query to handle the CREATED events of the data added
> from the client on local cache (cache.setLocal(true)).
> 3) For each ALPHA event we should generate one (or more) BRAVO data and add
> them to the cache(CACHE_BRAVO), this is done by a compute task in the event
> handler.
>
> This seems to work fine until we start a second ServerNode. What are we
> doing wrong here? We would like to process the events generated by the
> CACHE_ALPHA with compute tasks. Eventually we would like to have another
> Service as well for handling events of CACHE_BRAVO, but we already facing
> problems handling events of the continuous query of one cache on multiple
> server nodes.
>
> I have a reproducer attached here.
>
> striped-pool-starvation.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t1004/striped-pool-starvation.zip>
>
>
> Humphrey
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>