You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2021/12/10 12:26:00 UTC

[jira] [Commented] (RATIS-1465) Use seperate channel for group heartbeat

    [ https://issues.apache.org/jira/browse/RATIS-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457112#comment-17457112 ] 

Sammi Chen commented on RATIS-1465:
-----------------------------------

Here are the LOGs 

2021-12-02 13:57:37,719 [98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN org.apache.ratis.server.RaftServer$Division: 98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl: Lost leadership on term: 1. Election timeout: 5200ms. In charge for: 2290684ms. Conf: 0: [5a4a8be1-c921-4ca7-af7c-62a37a55cab7|rpc:9.186.21.247:9856|admin:9.186.21.247:9857|client:9.186.21.247:9858|dataStream:|priority:0, efdf0ed2-f836-4f4b-9dc8-981416d8a68d|rpc:9.37.156.222:9856|admin:9.37.156.222:9857|client:9.37.156.222:9858|dataStream:|priority:0, 98e5b27a-c3e9-4f86-ab85-b2caf84f012b|rpc:9.186.21.242:9856|admin:9.186.21.242:9857|client:9.186.21.242:9858|dataStream:|priority:1], old=null
2021-12-02 13:57:37,719 [98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN org.apache.ratis.server.RaftServer$Division: Follower 98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008->5a4a8be1-c921-4ca7-af7c-62a37a55cab7(c138703,m138989,n139378, attendVote=true, lastRpcSendTime=987, lastRpcResponseTime=8491)
2021-12-02 13:57:37,719 [98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008-LeaderStateImpl] WARN org.apache.ratis.server.RaftServer$Division: Follower 98e5b27a-c3e9-4f86-ab85-b2caf84f012b@group-151DB3A92008->efdf0ed2-f836-4f4b-9dc8-981416d8a68d(c138814,m139094,n139378, attendVote=true, lastRpcSendTime=318, lastRpcResponseTime=6570)

> Use seperate channel for group heartbeat
> ----------------------------------------
>
>                 Key: RATIS-1465
>                 URL: https://issues.apache.org/jira/browse/RATIS-1465
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>         Attachments: follower-hb-process-latency-with-patch.png, follower-hb-process-latency.png, leader-hb-receive-latency-1.png, leader-hb-receive-latency-with-patch.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a heavy load read/write cluster,  frequent leader step down is observed because of lost the majority heartbeat. 
> The investigation shows that follower side heartbeat process is very quick, while the leader side heartbeat latency is high.  See the attached metrics diagram. 
> This task aims to use seperate grpc channel for heartbeat to reduce the latency introduced by the network queuing. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)