You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Kaijie Chen (Jira)" <ji...@apache.org> on 2023/03/31 07:52:00 UTC

[jira] [Commented] (HDDS-8343) Failed to elect leader due to Ratis group not found

    [ https://issues.apache.org/jira/browse/HDDS-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707178#comment-17707178 ] 

Kaijie Chen commented on HDDS-8343:
-----------------------------------

Cc [~Sammi] and [~weichiu]  

> Failed to elect leader due to Ratis group not found
> ---------------------------------------------------
>
>                 Key: HDDS-8343
>                 URL: https://issues.apache.org/jira/browse/HDDS-8343
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Kaijie Chen
>            Priority: Major
>
> We have encountered some problem during upgrade.
>  
> Suppose we have 3 DataNodes forming a ratis group.
> On DN1 and DN2, the pipeline was closed and ratis group has been deleted.
> On DN3, the ratis group has not been deleted, so it failed to elect a leader.
> In this situation, we cannot read data from this pipeline.
>  
> Here are the logs
> {code:java}
> 2023-03-16 18:39:22,527 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO org.apache.ratis.server.RaftServer$Division: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from  FOLLOWER to CANDIDATE at term 724 for changeToCandidate
> 2023-03-16 18:39:22,527 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] ERROR org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: pipeline Action CLOSE on pipeline PipelineID=1b0b8153-71fd-437a-b486-bbea4a4fba6c.Reason : 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 322168ms
> 2023-03-16 18:39:22,527 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:22,527 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: start 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,554 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 ELECTION round 0: submit vote requests at term 725 for 0: peers:[f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e|rpc:9.179.142.251:9858|dataStream:|priority:0|startupRole:FOLLOWER, 33b49c34-caa2-4b4f-894e-dce7db4f97b9|rpc:9.180.20.222:9858|dataStream:|priority:1|startupRole:FOLLOWER, 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-16 18:39:22,554 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.min = 5s (fallback to raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,554 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.max = 5200ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-16 18:39:22,554 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:22,556 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310: ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:22,556 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection:   Exception 0: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:22,556 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection:   Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e: group-BBEA4A4FBA6C not found.
> 2023-03-16 18:39:22,556 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310 ELECTION round 0: result REJECTED
> 2023-03-16 18:39:22,557 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.RaftServer$Division: 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C: changes role from CANDIDATE to FOLLOWER at term 725 for REJECTED
> 2023-03-16 18:39:22,557 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310
> 2023-03-16 18:39:22,557 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-LeaderElection310] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: start 207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState
> 2023-03-16 18:39:22,557 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.min = 5s (fallback to raft.server.rpc.timeout.min)
> 2023-03-16 18:39:22,557 [207b98d9-ad64-45a8-940f-504b514feff5@group-BBEA4A4FBA6C-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.max = 5200ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.impl.FollowerState: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5189254060ns, electionTimeout:5189ms
> 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.RaftServer$Division: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from  FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] ERROR org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: pipeline Action CLOSE on pipeline PipelineID=4a5ed735-c797-45e8-a8e5-c8992d1fb40d.Reason : 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 325322ms
> 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.leaderelection.pre-vote = false (custom) 2023-03-16 18:39:25,688 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: start 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 ELECTION round 0: submit vote requests at term 707 for 0: peers:[1e40274c-a4bd-4e3d-8479-59f8105ec408|rpc:100.76.18.99:9858|dataStream:|priority:1|startupRole:FOLLOWER, 207b98d9-ad64-45a8-940f-504b514feff5|rpc:9.180.21.88:9858|dataStream:|priority:0|startupRole:FOLLOWER, bcdf3bd5-7b8e-435d-b3fa-b3e29f0eb307|rpc:9.180.5.41:9858|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.min = 5s (fallback to raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.max = 5200ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311: ELECTION REJECTED received 0 response(s) and 2 exception(s):
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection:   Exception 0: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:25,723 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection:   Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.LeaderElection: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311 ELECTION round 0: result REJECTED
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.RaftServer$Division: 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D: changes role from CANDIDATE to FOLLOWER at term 707 for REJECTED
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-LeaderElection311] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: start 207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.min = 5s (fallback to raft.server.rpc.timeout.min)
> 2023-03-16 18:39:25,724 [207b98d9-ad64-45a8-940f-504b514feff5@group-C8992D1FB40D-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.rpc.first-election.timeout.max = 5200ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO org.apache.ratis.server.impl.FollowerState: 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState: change to CANDIDATE, lastRpcElapsedTime:5018766985ns, electionTimeout:5018ms
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: shutdown 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO org.apache.ratis.server.RaftServer$Division: 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1: changes role from  FOLLOWER to CANDIDATE at term 706 for changeToCandidate
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] ERROR org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: pipeline Action CLOSE on pipeline PipelineID=5d8061d6-1692-4eb4-a604-3c37ea22a9c1.Reason : 207b98d9-ad64-45a8-940f-504b514feff5 is in candidate state for 326082ms
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO org.apache.ratis.server.RaftServerConfigKeys: raft.server.leaderelection.pre-vote = false (custom)
> 2023-03-16 18:39:26,439 [207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-FollowerState] INFO org.apache.ratis.server.impl.RoleInfo: 207b98d9-ad64-45a8-940f-504b514feff5: start 207b98d9-ad64-45a8-940f-504b514feff5@group-3C37EA22A9C1-LeaderElection312
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org