You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iotdb.apache.org by Eric Pai <Er...@hotmail.com> on 2021/08/29 09:36:15 UTC
Request for review: [IOTDB-1564]: Make leader failure detection and
election faster
Hi, Xiangdong and Xinyu,
The PR https://github.com/apache/iotdb/pull/3797 for JIRA https://issues.apache.org/jira/browse/IOTDB-1564 is ready for review.
Please give some suggestions to those codes~.
Thanks.
-----邮件原件-----
发件人: Xiangdong Huang <sa...@gmail.com>
发送时间: 2021年8月25日 12:02
收件人: dev <de...@iotdb.apache.org>
主题: Re: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
Hi,
current codes are:
```
long electionWait =
ClusterConstant.getElectionLeastTimeOutMs()
+ Math.abs(random.nextLong() %
ClusterConstant.getElectionRandomTimeOutMs());
```
where the comment says: electionLeastTimeOutMs should be at least as long as a heartbeat;
IMO, these two parameters are enough, and we do not need to add more parameters.
But the default value can be changed:
1. electionLeastTimeOutMs can be heartbeat *2 or something others, rather than 2 seconds by default.
2. by default, electionRandomTimeOutMs can be 50 ms or something like
heartbeat/10 ?
Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
黄向东
清华大学 软件学院
Eric Pai <er...@hotmail.com> 于2021年8月23日周一 上午10:18写道:
>
> Hi, Xiangdong,
>
> So what your suggestions about the election waiting time? Add another configuration parameter called election_wait_time_ms, or left as a shorter hardcode constant?
>
> 发件人: Eric Pai <Er...@hotmail.com>
> 日期: 2021年8月21日 星期六 下午7:32
> 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
> 主题: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
> detection and election faster
>
> Hi, all,
>
> Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.
>
> I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).
>
> The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.
>
> But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.
>
> Any good suggestions?
>
> 发件人: 白 渐
> 发送时间: 2021年8月18日 22:14
> 收件人: dev@iotdb.apache.org
> 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
> detection and election faster
>
> Hi, all,
>
> @Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
>
> JIRA link:
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
> ues.apache.org%2Fjira%2Fbrowse%2FIOTDB-1564&data=04%7C01%7C%7C9782
> 3463d4104095d18608d9677d1fd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C
> 0%7C637654609373686618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
> QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XxyiqSz7m
> KozmmG4E85jShds9D63H5vEVMfYExv4Sag%3D&reserved=0
>
> Two parameters are added:
>
> heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
>
> election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
>
> t1 t1
> Leader view: Send HB - - -> Send HB - - -> Send HB
> t2 t3
> Follower view: Receive HB - - -> Receive HB - - - - -> HB expired /
> Start election - - - - -> Election Timeout
>
> I will do the following works sooner or later:
>
> 1. Coding.
>
> 2. Proper test cases.
>
> 3. Docs about new parameters.
>
> Thanks.
>
>
Re: Request for review: [IOTDB-1564]: Make leader failure detection
and election faster
Posted by Xiangdong Huang <sa...@gmail.com>.
Hi,
Just one question.
any side effect after tuning heartbeat_interval from 1 second to 100
ms? e.g., CPU utilization.
Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
黄向东
清华大学 软件学院
Eric Pai <Er...@hotmail.com> 于2021年8月29日周日 下午5:36写道:
>
> Hi, Xiangdong and Xinyu,
>
> The PR https://github.com/apache/iotdb/pull/3797 for JIRA https://issues.apache.org/jira/browse/IOTDB-1564 is ready for review.
> Please give some suggestions to those codes~.
>
> Thanks.
>
> -----邮件原件-----
> 发件人: Xiangdong Huang <sa...@gmail.com>
> 发送时间: 2021年8月25日 12:02
> 收件人: dev <de...@iotdb.apache.org>
> 主题: Re: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster
>
> Hi,
>
> current codes are:
>
> ```
> long electionWait =
> ClusterConstant.getElectionLeastTimeOutMs()
> + Math.abs(random.nextLong() %
> ClusterConstant.getElectionRandomTimeOutMs());
> ```
>
> where the comment says: electionLeastTimeOutMs should be at least as long as a heartbeat;
>
> IMO, these two parameters are enough, and we do not need to add more parameters.
>
> But the default value can be changed:
> 1. electionLeastTimeOutMs can be heartbeat *2 or something others, rather than 2 seconds by default.
> 2. by default, electionRandomTimeOutMs can be 50 ms or something like
> heartbeat/10 ?
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
> Eric Pai <er...@hotmail.com> 于2021年8月23日周一 上午10:18写道:
> >
> > Hi, Xiangdong,
> >
> > So what your suggestions about the election waiting time? Add another configuration parameter called election_wait_time_ms, or left as a shorter hardcode constant?
> >
> > 发件人: Eric Pai <Er...@hotmail.com>
> > 日期: 2021年8月21日 星期六 下午7:32
> > 收件人: "dev@iotdb.apache.org" <de...@iotdb.apache.org>
> > 主题: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
> > detection and election faster
> >
> > Hi, all,
> >
> > Now the randomElectionWait time is hardcode as 3-5s, which is not suitable when the heartbeat_interval_ms and election_timeout_ms is too small.
> >
> > I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms + 50ms).
> >
> > The 50ms is referred from the Raft paper with a low probability and fast election when split votes happens.
> >
> > But I haven’t found any detailed descriptions about the relationship between heartbeat_interval_ms and the least waiting time.
> >
> > Any good suggestions?
> >
> > 发件人: 白 渐
> > 发送时间: 2021年8月18日 22:14
> > 收件人: dev@iotdb.apache.org
> > 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure
> > detection and election faster
> >
> > Hi, all,
> >
> > @Xinyu Tan and me have made a conclusion about the refine of hearbeat and election related timeout parameters:
> >
> > JIRA link:
> > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
> > ues.apache.org%2Fjira%2Fbrowse%2FIOTDB-1564&data=04%7C01%7C%7C9782
> > 3463d4104095d18608d9677d1fd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C
> > 0%7C637654609373686618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
> > QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XxyiqSz7m
> > KozmmG4E85jShds9D63H5vEVMfYExv4Sag%3D&reserved=0
> >
> > Two parameters are added:
> >
> > heartbeat_interval_ms (t1): The time interval(ms) between two rounds of heartbeat broadcast of one raft group leader.
> >
> > election_timeout_ms (t2 and t3): The election timeout time of candidates and followers, or as the parameter of waiting for voting result.
> >
> > t1 t1
> > Leader view: Send HB - - -> Send HB - - -> Send HB
> > t2 t3
> > Follower view: Receive HB - - -> Receive HB - - - - -> HB expired /
> > Start election - - - - -> Election Timeout
> >
> > I will do the following works sooner or later:
> >
> > 1. Coding.
> >
> > 2. Proper test cases.
> >
> > 3. Docs about new parameters.
> >
> > Thanks.
> >
> >