You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2020/02/21 12:42:00 UTC

[jira] [Updated] (IGNITE-12714) Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT

     [ https://issues.apache.org/jira/browse/IGNITE-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Kalashnikov updated IGNITE-12714:
---------------------------------------
    Description: 
Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-2.5.9.jar:2.5.9]
{noformat}


*Solution:*
Increase timeout to 2 min org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT

  was:
Scenario:
1. Start 3 data nodes 
2. Start load with a streamer on 6 clients
3. Start data nodes restarter

Result:
Keys weren't loaded in all (1000) caches.
In the server node log I see:
{noformat}
[2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
[2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, waitCnt=169964]
[2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804) ~[ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506) [ignite-core-2.5.9.jar:2.5.9]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-2.5.9.jar:2.5.9]
{noformat}

Logs: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23
Log with dumps: ftp://gg@172.25.2.50/poc-tester-logs/1723/log-2019-07-17-17-33-23/servers/172.25.1.12/poc-tester-server-172.25.1.12-id-0-2019-07-17-16-46-58.log-1-2019-07-17.log.gz


*Solution:*
Increase timeout to 2 min org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT


> Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT
> ----------------------------------------------------------------
>
>                 Key: IGNITE-12714
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12714
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Kalashnikov
>            Assignee: Anton Kalashnikov
>            Priority: Major
>
> Scenario:
> 1. Start 3 data nodes 
> 2. Start load with a streamer on 6 clients
> 3. Start data nodes restarter
> Result:
> Keys weren't loaded in all (1000) caches.
> In the server node log I see:
> {noformat}
> [2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
> [2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111, waitCnt=169964]
> [2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
> org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7, igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
>     at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838) ~[ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833) ~[ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230) ~[ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) ~[ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804) ~[ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568) [ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866) [ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506) [ignite-core-2.5.9.jar:2.5.9]
>     at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-2.5.9.jar:2.5.9]
> {noformat}
> *Solution:*
> Increase timeout to 2 min org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)