You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Yaroslav Molochkov (Jira)" <ji...@apache.org> on 2020/11/03 15:08:00 UTC

[jira] [Updated] (IGNITE-13540) Exchange worker, waiting for new task from queue, considered as blocked.

     [ https://issues.apache.org/jira/browse/IGNITE-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yaroslav Molochkov updated IGNITE-13540:
----------------------------------------
    Fix Version/s: 2.9.1

> Exchange worker, waiting for new task from queue, considered as blocked.
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-13540
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13540
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.9, 2.8.1
>            Reporter: Ivan Daschinskiy
>            Assignee: Ivan Daschinskiy
>            Priority: Minor
>              Labels: 2.9.1-rc
>             Fix For: 2.10, 2.9.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Waiting for new task in ExchangeWorker#body now is not marking as blocking section.
> So if network timeout (timeout for polling task from queue) is greater than system worker blocked timeout, exchange worker thread is considered as blocking. Sometimes this is reported in logs after few seconds when actually PME has been finished
> {noformat}
> [2020-10-06 16:55:45,939][INFO ][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, node=163fd0f0-b9a4-4317-a28f-f7dbdb776076]
> [2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=partition-exchanger, threadName=exchange-worker-#50, blockedFor=2s]
> [2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread [name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, waitCnt=48]
>     Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e, ownerName=null, ownerId=-1]
> [2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]]
> class org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860)
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855)
> 	at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
> 	at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)