You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by 白 渐 <Er...@hotmail.com> on 2021/08/18 14:13:53 UTC
Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
detection and election faster
Hi, all,
@Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564
Two parameters are added:
heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
t1 t1
Leader view: Send HB - - -> Send HB - - -> Send HB
t2 t3
Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout
I will do the following works sooner or later:
1. Coding.
2. Proper test cases.
3. Docs about new parameters.
Thanks.
Re: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
detection and election faster
Posted by Xiangdong Huang <sa...@gmail.com>.
Hi Eric,
I just read the HeartbeatThread.java file.
Follower waits ClusterConstant.getConnectionTimeoutInMS() time
interval for getting the new heartbeat.
If not, it will wait a random time to start its election.
Leader sends heartbeat per ClusterConstant.getHeartBeatIntervalMs().
Seems that Follower does not know the heartbeatInterval...
Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
黄向东
清华大学 软件学院
Eric Pai <Er...@hotmail.com> 于2021年8月21日周六 下午7:33写道:
>
> Hi, all,
>
> Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.
>
> I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).
>
> The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.
>
> But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.
>
> Any good suggestions?
>
> 发件人: 白 渐
> 发送时间: 2021年8月18日 22:14
> 收件人: dev@iotdb.apache.org
> 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
>
> Hi, all,
>
> @Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
>
> JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564
>
> Two parameters are added:
>
> heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
>
> election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
>
> t1 t1
> Leader view: Send HB - - -> Send HB - - -> Send HB
> t2 t3
> Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout
>
> I will do the following works sooner or later:
>
> 1. Coding.
>
> 2. Proper test cases.
>
> 3. Docs about new parameters.
>
> Thanks.
>
>
回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
Posted by Eric Pai <Er...@hotmail.com>.
Hi, all,
Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.
I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).
The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.
But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.
Any good suggestions?
发件人: 白 渐
发送时间: 2021年8月18日 22:14
收件人: dev@iotdb.apache.org
主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
Hi, all,
@Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564
Two parameters are added:
heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
t1 t1
Leader view: Send HB - - -> Send HB - - -> Send HB
t2 t3
Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout
I will do the following works sooner or later:
1. Coding.
2. Proper test cases.
3. Docs about new parameters.
Thanks.