You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2021/09/29 06:22:00 UTC

[jira] [Updated] (HDDS-5794) The misleading "No available thread in pool for past * sencond" log message in DN StateContext

     [ https://issues.apache.org/jira/browse/HDDS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sammi Chen updated HDDS-5794:
-----------------------------
    Summary: The misleading "No available thread in pool for past * sencond" log message in DN StateContext  (was: The misleading "No available thread in pool for past * sencond" message in DN StateContext)

> The misleading "No available thread in pool for past * sencond" log message in DN StateContext
> ----------------------------------------------------------------------------------------------
>
>                 Key: HDDS-5794
>                 URL: https://issues.apache.org/jira/browse/HDDS-5794
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>
> There are a lot of following messages in DN log file,  which makes me think there is something wrong with the heartbeat channel between the DN and SCM.
> 2021-09-28 10:59:36,871 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,345 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,379 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,584 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,598 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,631 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,650 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,652 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,668 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,741 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,918 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,925 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,930 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,954 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,955 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,995 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,005 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,097 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,114 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,174 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
> After add some extra debug info, I found the thing is not as the LOG shows,  the heartbeat channel is very healthy and this "No available thread in pool for past 30 seconds"  is quite misleading.   The corrent part of the LOG is "No available thread in pool",  the wrong part is "for the past 30 seconds". 
> 2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719157366
> 2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 24
> 2021-09-29 11:28:01,257 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
> 2021-09-29 11:28:01,261 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
> 2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished executing the task 24
> 2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: Sleep 29996 ms
> 2021-09-29 11:28:05,622 [ChunkWriter-125-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-71-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161731
> 2021-09-29 11:28:05,622 [ChunkWriter-165-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-53-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 25
> 2021-09-29 11:28:05,622 [ChunkWriter-31-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-41-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-25-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] ERROR org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: execute task.wait is interrupted.
> 2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161732
> 2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 26
> 2021-09-29 11:28:05,623 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
> 2021-09-29 11:28:05,641 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
> 2021-09-29 11:28:05,642 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
> 2021-09-29 11:28:05,652 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
> 2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished executing the task 26
> 2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: Sleep 29971 ms
> 2021-09-29 11:28:05,709 [ChunkWriter-73-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161818
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 27
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] ERROR org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: execute task.wait is interrupted.
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161818
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org