You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2023/05/04 16:34:00 UTC

[jira] [Commented] (RATIS-1803) GrpcLogAppender can't resolve host in kubernetes cluster

    [ https://issues.apache.org/jira/browse/RATIS-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719416#comment-17719416 ] 

Tsz-wo Sze commented on RATIS-1803:
-----------------------------------

[~liuyaolong], is this still a problem?

> GrpcLogAppender can't resolve host in kubernetes cluster
> --------------------------------------------------------
>
>                 Key: RATIS-1803
>                 URL: https://issues.apache.org/jira/browse/RATIS-1803
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 2.4.1
>            Reporter: Yaolong Liu
>            Priority: Major
>
> In a k8s container environment, the candidate cannot resolve the host of one of the followers during the election process. After the election is successful, the leader cannot resolve the host of the follower normally, resulting in failure to send the heartbeat. Followers have been initiating pre-votes but they are always rejected. After half an hour, the cluster returns to normal.
> leader log:
> {code:java}
> 2023-03-02 10:26:11,909 INFO  RoleInfo - alluxio-master-1_19200: start alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1
> 2023-03-02 10:26:11,910 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 PRE_VOTE round 0: submit vote requests at term 0 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:26:11,914 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:26:11,914 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:26:11,915 INFO  GrpcServerProtocolClient - Build channel for alluxio-master-0_19200
> 2023-03-02 10:26:11,915 INFO  GrpcServerProtocolClient - Build channel for alluxio-master-2_19200
> 2023-03-02 10:26:11,920 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,945 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1: PRE_VOTE PASSED received 1 response(s) and 1 exception(s):
> 2023-03-02 10:26:11,945 INFO  LeaderElection -   Response 0: alluxio-master-1_19200<-alluxio-master-2_19200#0:OK-t0
> 2023-03-02 10:26:11,945 INFO  LeaderElection -   Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,945 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 PRE_VOTE round 0: result PASSED
> 2023-03-02 10:26:11,948 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 ELECTION round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:26:11,948 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:26:11,948 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:26:11,948 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 got exception when requesting votes: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,961 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1: ELECTION PASSED received 1 response(s) and 1 exception(s):
> 2023-03-02 10:26:11,961 INFO  LeaderElection -   Response 0: alluxio-master-1_19200<-alluxio-master-2_19200#0:OK-t1
> 2023-03-02 10:26:11,961 INFO  LeaderElection -   Exception 1: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:26:11,961 INFO  LeaderElection - alluxio-master-1_19200@group-ABB3109A44C1-LeaderElection1 ELECTION round 0: result PASSED
> ....
> 2023-03-02 10:27:29,535 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=97556) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:32,035 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:32,035 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:32,035 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=100056) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:32,035 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-GrpcLogAppender: Leader has not got in touch with Follower alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200(c-1,m0,n9, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=100057) yet, just keep nextIndex unchanged and retry.
> 2023-03-02 10:27:33,444 INFO  VoteContext - alluxio-master-1_19200@group-ABB3109A44C1-LEADER: reject PRE_VOTE from alluxio-master-0_19200: this server is the leader and still has leadership
> 2023-03-02 10:27:34,535 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> 2023-03-02 10:27:34,535 WARN  GrpcLogAppender - alluxio-master-1_19200@group-ABB3109A44C1->alluxio-master-0_19200-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host alluxio-master-0
> {code}
> The follower log:
> {code:java}
> 2023-03-02 10:27:21,985 INFO  RaftServerConfigKeys - raft.server.leaderelection.pre-vote = true (default)
> 2023-03-02 10:27:21,985 INFO  RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6
> 2023-03-02 10:27:21,986 INFO  LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6 PRE_VOTE round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:27:21,987 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:21,987 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:22,019 INFO  LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6: PRE_VOTE REJECTED received 2 response(s) and 0 exception(s):
> 2023-03-02 10:27:22,019 INFO  LeaderElection -   Response 0: alluxio-master-0_19200<-alluxio-master-2_19200#0:FAIL-t1
> 2023-03-02 10:27:22,019 INFO  LeaderElection -   Response 1: alluxio-master-0_19200<-alluxio-master-1_19200#0:FAIL-t1
> 2023-03-02 10:27:22,019 INFO  LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6 PRE_VOTE round 0: result REJECTED
> 2023-03-02 10:27:22,019 INFO  RoleInfo - alluxio-master-0_19200: shutdown alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection6
> 2023-03-02 10:27:22,020 INFO  RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-FollowerState
> 2023-03-02 10:27:22,021 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:22,021 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:33,412 INFO  FollowerState - alluxio-master-0_19200@group-ABB3109A44C1-FollowerState: change to CANDIDATE, lastRpcElapsedTime:11392399572ns, electionTimeout:11391ms
> 2023-03-02 10:27:33,412 INFO  RoleInfo - alluxio-master-0_19200: shutdown alluxio-master-0_19200@group-ABB3109A44C1-FollowerState
> 2023-03-02 10:27:33,413 INFO  RaftServerConfigKeys - raft.server.leaderelection.pre-vote = true (default)
> 2023-03-02 10:27:33,413 INFO  RoleInfo - alluxio-master-0_19200: start alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7
> 2023-03-02 10:27:33,414 INFO  LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7 PRE_VOTE round 0: submit vote requests at term 1 for -1: peers:[alluxio-master-0_19200|rpc:alluxio-master-0:19200|priority:0|startupRole:FOLLOWER, alluxio-master-2_19200|rpc:alluxio-master-2:19200|priority:0|startupRole:FOLLOWER, alluxio-master-1_19200|rpc:alluxio-master-1:19200|priority:0|startupRole:FOLLOWER]|listeners:[], old=null
> 2023-03-02 10:27:33,414 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.min = 10000ms (fallback to raft.server.rpc.timeout.min)
> 2023-03-02 10:27:33,415 INFO  RaftServerConfigKeys - raft.server.rpc.first-election.timeout.max = 20000ms (fallback to raft.server.rpc.timeout.max)
> 2023-03-02 10:27:33,446 INFO  LeaderElection - alluxio-master-0_19200@group-ABB3109A44C1-LeaderElection7: PRE_VOTE REJECTED received 2 response(s) and 0 exception(s):
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)