You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2023/05/04 16:34:00 UTC
[jira] [Commented] (RATIS-1803) GrpcLogAppender can't resolve host in kubernetes cluster
[ https://issues.apache.org/jira/browse/RATIS-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719416#comment-17719416 ]
Tsz-wo Sze commented on RATIS-1803:
-----------------------------------
[~liuyaolong], is this still a problem?
> GrpcLogAppender can't resolve host in kubernetes cluster
> --------------------------------------------------------
>
> Key: RATIS-1803
> URL: https://issues.apache.org/jira/browse/RATIS-1803
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 2.4.1
> Reporter: Yaolong Liu
> Priority: Major
>
> In a k8s container environment, the candidate cannot resolve the host of one of the followers during the election process. After the election is successful, the leader cannot resolve the host of the follower normally, resulting in failure to send the heartbeat. Followers have been initiating pre-votes but they are always rejected. After half an hour, the cluster returns to normal.
> leader log:
> {code:java}
> 2023-03-02 10:26:11,909 INFO RoleInfo - alluxio-master-1_19200: start alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1
> 2023-03-02 10:26:11,910 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 PRE_VOTE round 0: submit vote requests at term 0 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:26:11,914 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:26:11,914 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:26:11,915 INFO GrpcServerProtocolClient - Build channel for alluxio-master-0_19200
> 2023-03-02 10:26:11,915 INFO GrpcServerProtocolClient - Build channel for alluxio-master-2_19200
> 2023-03-02 10:26:11,920 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,945 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1: PRE_VOTE PASSED received 1 response(s) and 1 exception(s):
> 2023-03-02 10:26:11,945 INFO LeaderElection - Response 0: alluxio-master-1_19200<-alluxio-master-2_19200#0:OK-t0
> 2023-03-02 10:26:11,945 INFO LeaderElection - Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,945 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 PRE_VOTE round 0: result PASSED
> 2023-03-02 10:26:11,948 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 ELECTION round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:26:11,948 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:26:11,948 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:26:11,948 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,961 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1: ELECTION PASSED received 1 response(s) and 1 exception(s):
> 2023-03-02 10:26:11,961 INFO LeaderElection - Response 0: alluxio-master-1_19200<-alluxio-master-2_19200#0:OK-t1
> 2023-03-02 10:26:11,961 INFO LeaderElection - Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,961 INFO LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 ELECTION round 0: result PASSED
> ....
> 2023-03-02 10:27:29,535 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=97556) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:32,035 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:32,035 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:32,035 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=100056) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:32,035 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=100057) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:33,444 INFO VoteContext - alluxio-master-1_19200@group-ABB3109A44C1-LEADER: reject PRE_VOTE from alluxio-master-0_19200: this server is the leader and still has leadership
> 2023-03-02 10:27:34,535 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:34,535 WARN GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> {code}
> The follower log:
> {code:java}
> 2023-03-02 10:27:21,985 INFO RaftServerConfigKeys - raft.server.leaderelection.pre-vote = true (default)
> 2023-03-02 10:27:21,985 INFO RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6
> 2023-03-02 10:27:21,986 INFO LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6 PRE_VOTE round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:27:21,987 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:21,987 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:22,019 INFO LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6: PRE_VOTE REJECTED received 2 response(s) and 0 exception(s):
> 2023-03-02 10:27:22,019 INFO LeaderElection - Response 0: alluxio-master-0_19200<-alluxio-master-2_19200#0:FAIL-t1
> 2023-03-02 10:27:22,019 INFO LeaderElection - Response 1: alluxio-master-0_19200<-alluxio-master-1_19200#0:FAIL-t1
> 2023-03-02 10:27:22,019 INFO LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6 PRE_VOTE round 0: result REJECTED
> 2023-03-02 10:27:22,019 INFO RoleInfo - alluxio-master-0_19200: shutdown alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6
> 2023-03-02 10:27:22,020 INFO RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-FollowerState
> 2023-03-02 10:27:22,021 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:22,021 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:33,412 INFO FollowerState - alluxio-master-0_19200@group-ABB3109A44C1-FollowerState: change to CANDIDATE, lastRpcElapsedTime:11392399572ns, electionTimeout:11391ms
> 2023-03-02 10:27:33,412 INFO RoleInfo - alluxio-master-0_19200: shutdown alluxio-master-0_19200@group-ABB3109A44C1-FollowerState
> 2023-03-02 10:27:33,413 INFO RaftServerConfigKeys - raft.server.leaderelection.pre-vote = true (default)
> 2023-03-02 10:27:33,413 INFO RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7
> 2023-03-02 10:27:33,414 INFO LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7 PRE_VOTE round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:27:33,414 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:33,415 INFO RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:33,446 INFO LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7: PRE_VOTE REJECTED received 2 response(s) and 0 exception(s):
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)