You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Hui Fei (Jira)" <ji...@apache.org> on 2022/10/20 07:51:00 UTC

[jira] [Commented] (HDDS-6900) SCM node went down with UndeclaredThrowableException while running container balancer

    [ https://issues.apache.org/jira/browse/HDDS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620875#comment-17620875 ] 

Hui Fei commented on HDDS-6900:
-------------------------------

Added 1.3.0 as the fixed version too.

> SCM node went down with UndeclaredThrowableException while running container balancer
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-6900
>                 URL: https://issues.apache.org/jira/browse/HDDS-6900
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>            Reporter: Nilotpal Nandi
>            Assignee: Siddhant Sangwan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.3.0, 1.2.2
>
>
> SCM nodeĀ  went down with UndeclaredThrowableException when container balancer is running and 2 other SCM nodes were shutdown.
> {noformat}
> 2022-06-15 20:00:15,634 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender: Leader has not got in touch with Follower 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310, attendVote=true, lastRpcSendTime=1, lastRpcResponseTime=32843) yet, just keep nextIndex unchanged and retry. 2022-06-15 20:00:16,887 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2022-06-15 20:00:16,888 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender: Leader has not got in touch with Follower 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310, attendVote=true, lastRpcSendTime=4, lastRpcResponseTime=34097) yet, just keep nextIndex unchanged and retry. 2022-06-15 20:00:18,121 ERROR org.apache.ratis.server.impl.StateMachineUpdater: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-StateMachineUpdater caught a Throwable. java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy19.completeMove(Unknown Source) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.deleteSrcDnForMove(LegacyReplicationManager.java:1249) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$40(LegacyReplicationManager.java:1871) at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.onLeaderReadyAndOutOfSafeMode(LegacyReplicationManager.java:1850) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.notifyStatusChanged(LegacyReplicationManager.java:1649) at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.notifyStatusChanged(ReplicationManager.java:375) at org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:52) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:330) at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1566) at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:225) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:111) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67) ... 13 more 2022-06-15 20:00:18,122 INFO org.apache.ratis.server.RaftServer$Division: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF: shutdown 2022-06-15 20:00:18,122 INFO org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-0B75F4A309CF,id=99c85376-060f-4b3c-8973-a2d2b1dd23e6 2022-06-15 20:00:18,122 INFO org.apache.ratis.server.impl.RoleInfo: 99c85376-060f-4b3c-8973-a2d2b1dd23e6: shutdown 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-LeaderStateImpl 2022-06-15 20:00:18,124 INFO org.apache.ratis.server.impl.PendingRequests: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-PendingRequests: sendNotLeaderResponses 2022-06-15 20:00:18,125 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->b6382f07-de2e-4986-8275-9146e73360a6-GrpcLogAppender: Wait interrupted by java.lang.InterruptedException 2022-06-15 20:00:18,128 INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: current leader SCM steps down.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org