You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by AravindJP <ar...@gmail.com> on 2020/07/26 03:35:16 UTC

Blocked system-critical thread has been detected

I have Kubernetes Cluster (on GCP)  with Apache 2.8.1 (upgraded from 2.8.0 )
with  Gridgrain Control center installed. For last 1 weeks Ignite cluster
has 0 load (no read/write request to cluster) .  But I am seeing below
exception in my cluster node  with lot of threads in TIMED_WAITING, WAITING
STAGE, any clue why this behaviour occurs ?  This is happening 2nd time
without any load on cluster . Last week also I had same issue and restarted
the cluster and kept it idle to confirm this behaviour . I have uploaded
complete log also 

here  logs-asia-ignite.gz
<http://apache-ignite-users.70518.x6.nabble.com/file/t2807/logs-asia-ignite.gz>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Blocked system-critical thread has been detected

Posted by aealexsandrov <ae...@gmail.com>.

Hi,

Your log doesn't have the full thread dumps and I can't find some
information (e.g Topology Snapshots). However, I see that checkpoint thread
was blocked for a long time:

[02:45:50,849][SEVERE][tcp-disco-msg-worker-[3dac150e
10.20.4.18:47500]-#2][G] Blocked system-critical thread has been detected.
This can lead to cluster-wide undefined behaviour
[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#54,
blockedFor=172s]

But I see that it blocked not longer then 3 minutes.

I guess that checkpoint lock can't be taken until some other operation will
not be timeout. It can be some network related timeout or some operation
timeout.

So please check your configuration and find where you have 3 min timeout and
check what is related to this timeout. 

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/