You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by 白 渐 <Er...@hotmail.com> on 2021/08/18 14:13:53 UTC

Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster

Hi, all,

@Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:

JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564

Two parameters are added:

heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.

election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.

                       t1             t1
Leader view: Send HB - - -> Send HB - - -> Send HB
                                                t2                                     t3
Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout

I will do the following works sooner or later:

1.     Coding.

2.     Proper test cases.

3.     Docs about new parameters.

Thanks.



Re: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster

Posted by Xiangdong Huang <sa...@gmail.com>.
Hi Eric,
I just read the HeartbeatThread.java file.

Follower waits ClusterConstant.getConnectionTimeoutInMS() time
interval for getting the new heartbeat.
If not, it will wait a random time to start its election.

Leader sends heartbeat per ClusterConstant.getHeartBeatIntervalMs().

Seems that Follower does not know the heartbeatInterval...

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院

Eric Pai <Er...@hotmail.com> 于2021年8月21日周六 下午7:33写道:
>
> Hi, all,
>
> Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.
>
> I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).
>
> The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.
>
> But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.
>
> Any good suggestions?
>
> 发件人: 白 渐
> 发送时间: 2021年8月18日 22:14
> 收件人: dev@iotdb.apache.org
> 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
>
> Hi, all,
>
> @Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
>
> JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564
>
> Two parameters are added:
>
> heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
>
> election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
>
>                        t1             t1
> Leader view: Send HB - - -> Send HB - - -> Send HB
>                                                 t2                                     t3
> Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout
>
> I will do the following works sooner or later:
>
> 1.     Coding.
>
> 2.     Proper test cases.
>
> 3.     Docs about new parameters.
>
> Thanks.
>
>

回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster

Posted by Eric Pai <Er...@hotmail.com>.
Hi, all,

Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.

I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).

The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.

But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.

Any good suggestions?

发件人: 白 渐
发送时间: 2021年8月18日 22:14
收件人: dev@iotdb.apache.org
主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster

Hi, all,

@Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:

JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564

Two parameters are added:

heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.

election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.

                       t1             t1
Leader view: Send HB - - -> Send HB - - -> Send HB
                                                t2                                     t3
Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start election - - - - -> Election Timeout

I will do the following works sooner or later:

1.     Coding.

2.     Proper test cases.

3.     Docs about new parameters.

Thanks.