You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by xijiu <42...@qq.com.INVALID> on 2022/03/03 11:47:45 UTC

回复: [VOTE][RIP-36] Optimize topic routing mechanism

Thanks for your reply~



There are two points I will briefly explain:



1.In addition to the problem of not being able to obtain the latest routing data of the topic in time, it also faces the following problems:

If the client frequently accesses a topic that does not exist, the path of each access will become lengthy and require two network requests

send request to nameServer to get topic route data

send default topic TBW102 request to nameServer

In some applications, the client will access many topics. After these topics are accessed once, they may no longer be accessed or very infrequent, but the client will still pull routing information from the NameServer each time during round-robin training, which increases network overhead. Zombie Topic also occupies more memory

Related issues

https://github.com/apache/rocketmq/issues/3207

https://github.com/apache/rocketmq/issues/3858

https://github.com/apache/rocketmq/issues/3870



2.The purpose of this modification is to supplement the current rotation training strategy, and the notification strategy will be adjusted according to the busy degree of the machine. Therefore, this complexity will not bring stability pressure






感谢您的回复



有两点我简单阐述一下:



1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题

Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求

send request to nameServer to get topic route data

send default topic TBW102 request to nameServer

某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从 NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存

此问题相关的issues:

https://github.com/apache/rocketmq/issues/3207

https://github.com/apache/rocketmq/issues/3858

https://github.com/apache/rocketmq/issues/3870




2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力







------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <vintagewang@apache.org&gt;;
发送时间:&nbsp;2022年3月2日(星期三) 晚上9:00
收件人:&nbsp;"dev"<dev@rocketmq.apache.org&gt;;

主题:&nbsp;Re: [VOTE][RIP-36] Optimize topic routing mechanism



I read the whole plan, it is beneficial for the nameserver to actively push
changes to the client, but this benefit also brings complexity. I
personally think this benefit is not very big. Unless there is a better
explanation, I will reject this proposal.

Best regards,

Xiaorui Wang 王小瑞
Apache RocketMQ PMC Chair


xijiu <422766572@qq.com.invalid&gt; 于2022年3月2日周三 19:42写道:

&gt; Hi, RocketMQ Community,
&gt;
&gt; As discussed in the previous email, we launched a new RIP to optimize
&gt; topic routing mechanism. Now the shepherds @dongeforever and @yukon are
&gt; willing to support the RIP, so I think it is time to start an email thread
&gt; to enter the voting process.
&gt;
&gt;
&gt; The vote will be open for at least 72 hours or until a necessary number of
&gt; votes are reached.
&gt;
&gt; Please vote accordingly:
&gt;
&gt; [ ] +1 approve
&gt; [ ] +0 no opinion
&gt; [ ] -1 disapprove with the reason
&gt;
&gt;
&gt; Best Regards!
&gt; xijiu
&gt;
&gt; links:
&gt; https://shimo.im/docs/vVAXVrDNnoSrMBqm/

Re: [VOTE][RIP-36] Optimize topic routing mechanism

Posted by Xiaorui Wang <vi...@apache.org>.
+1

This RIP aims to reduce the unavailable time of order messages during
Broker switching, which decreases time from 30s to less than 1s. This
improvement is of great value and worthy of recognition.

I would like to ask other PMC members to review the details of the whole
program. As complexity increases, there may be other potential problems
that I hope can be controlled. IMO, I quite agree to vote for the proposal.

Best regards,

Xiaorui Wang 王小瑞
Apache RocketMQ PMC chair


dongeforever <do...@apache.org> 于2022年3月4日周五 11:25写道:

> The core problem is up to 30 seconds of unavailable time during broker
> startup/shutdown or logic queue remapping, for the metadata discovery is
> too slow by scheduled pull.
> For non-ordered topics, the message will be failover to another broker. But
> for the ordered topic, more precisely, the topic with fixed queue num, the
> unavailable time will be up to 30 seconds. This is not tolerable.
> Adding the push mechanism will decrease the unavailable time from 30
> seconds to 1~2 seconds.
>
> BTW, we should also pay attention to the complexity. To minimize the
> complexity, the push mechanism will be a bypass flow, will not harm the
> main pull flow.
>
> As for the problem "topic or broker not exist" or "resource overhead",  it
> is just be polished in passing.
>
> The original issue is https://github.com/apache/rocketmq/issues/3843,
> which wants to reduce the unavailable time during broker(with dledger) role
> change, reduce the impact on sequential message producers.
>
>
>
>
>
> Xiaorui Wang <vi...@apache.org> 于2022年3月3日周四 22:57写道:
>
> > Thank you for your prompt reply.
> >
> > I have read your email carefully and know that what you said is mainly
> > about the following two problems.
> >
> > Problem one: Accesses a topic that does not exist, the path of each
> access
> > will be twice as long.
> >
> > Problem two: Because of the increasing number of topics, the network
> > overhead and memory will be increased by the round-robin training.
> >
> > For the above, I hope you could provide more quantitative data.
> >
> > IMO, I have such suggestions for the above problems, which is only for
> > reference.
> >
> > For question one: If push mechanism is added, whether to remove pull
> > mechanism, otherwise the problem will still exist.
> >
> > For question two: Whether the network overhead and memory overhead have a
> > significant impact on the application, if there is no modification.
> >
> > I hope my advice will be helpful to you, rather than disturbing you. Our
> > common goal is to fully discuss an architectural change and make it
> better.
> >
> > Best regards,
> >
> > Xiaorui Wang 王小瑞
> > Apache RocketMQ PMC Chair
> >
> >
> > xijiu <42...@qq.com.invalid> 于2022年3月3日周四 19:47写道:
> >
> > > Thanks for your reply~
> > >
> > >
> > >
> > > There are two points I will briefly explain:
> > >
> > >
> > >
> > > 1.In addition to the problem of not being able to obtain the latest
> > > routing data of the topic in time, it also faces the following
> problems:
> > >
> > > If the client frequently accesses a topic that does not exist, the path
> > of
> > > each access will become lengthy and require two network requests
> > >
> > > send request to nameServer to get topic route data
> > >
> > > send default topic TBW102 request to nameServer
> > >
> > > In some applications, the client will access many topics. After these
> > > topics are accessed once, they may no longer be accessed or very
> > > infrequent, but the client will still pull routing information from the
> > > NameServer each time during round-robin training, which increases
> network
> > > overhead. Zombie Topic also occupies more memory
> > >
> > > Related issues
> > >
> > > https://github.com/apache/rocketmq/issues/3207
> > >
> > > https://github.com/apache/rocketmq/issues/3858
> > >
> > > https://github.com/apache/rocketmq/issues/3870
> > >
> > >
> > >
> > > 2.The purpose of this modification is to supplement the current
> rotation
> > > training strategy, and the notification strategy will be adjusted
> > according
> > > to the busy degree of the machine. Therefore, this complexity will not
> > > bring stability pressure
> > >
> > >
> > >
> > >
> > >
> > >
> > > 感谢您的回复
> > >
> > >
> > >
> > > 有两点我简单阐述一下:
> > >
> > >
> > >
> > > 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题
> > >
> > > Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求
> > >
> > > send request to nameServer to get topic route data
> > >
> > > send default topic TBW102 request to nameServer
> > >
> > > 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从
> > > NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存
> > >
> > > 此问题相关的issues:
> > >
> > > https://github.com/apache/rocketmq/issues/3207
> > >
> > > https://github.com/apache/rocketmq/issues/3858
> > >
> > > https://github.com/apache/rocketmq/issues/3870
> > >
> > >
> > >
> > >
> > >
> > >
> >
> 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------&nbsp;原始邮件&nbsp;------------------
> > > 发件人:
> > >                                                   "dev"
> > >                                                                 <
> > > vintagewang@apache.org&gt;;
> > > 发送时间:&nbsp;2022年3月2日(星期三) 晚上9:00
> > > 收件人:&nbsp;"dev"<dev@rocketmq.apache.org&gt;;
> > >
> > > 主题:&nbsp;Re: [VOTE][RIP-36] Optimize topic routing mechanism
> > >
> > >
> > >
> > > I read the whole plan, it is beneficial for the nameserver to actively
> > push
> > > changes to the client, but this benefit also brings complexity. I
> > > personally think this benefit is not very big. Unless there is a better
> > > explanation, I will reject this proposal.
> > >
> > > Best regards,
> > >
> > > Xiaorui Wang 王小瑞
> > > Apache RocketMQ PMC Chair
> > >
> > >
> > > xijiu <422766572@qq.com.invalid&gt; 于2022年3月2日周三 19:42写道:
> > >
> > > &gt; Hi, RocketMQ Community,
> > > &gt;
> > > &gt; As discussed in the previous email, we launched a new RIP to
> > optimize
> > > &gt; topic routing mechanism. Now the shepherds @dongeforever and
> @yukon
> > > are
> > > &gt; willing to support the RIP, so I think it is time to start an
> email
> > > thread
> > > &gt; to enter the voting process.
> > > &gt;
> > > &gt;
> > > &gt; The vote will be open for at least 72 hours or until a necessary
> > > number of
> > > &gt; votes are reached.
> > > &gt;
> > > &gt; Please vote accordingly:
> > > &gt;
> > > &gt; [ ] +1 approve
> > > &gt; [ ] +0 no opinion
> > > &gt; [ ] -1 disapprove with the reason
> > > &gt;
> > > &gt;
> > > &gt; Best Regards!
> > > &gt; xijiu
> > > &gt;
> > > &gt; links:
> > > &gt; https://shimo.im/docs/vVAXVrDNnoSrMBqm/
> >
>

Re: [VOTE][RIP-36] Optimize topic routing mechanism

Posted by dongeforever <do...@apache.org>.
The core problem is up to 30 seconds of unavailable time during broker
startup/shutdown or logic queue remapping, for the metadata discovery is
too slow by scheduled pull.
For non-ordered topics, the message will be failover to another broker. But
for the ordered topic, more precisely, the topic with fixed queue num, the
unavailable time will be up to 30 seconds. This is not tolerable.
Adding the push mechanism will decrease the unavailable time from 30
seconds to 1~2 seconds.

BTW, we should also pay attention to the complexity. To minimize the
complexity, the push mechanism will be a bypass flow, will not harm the
main pull flow.

As for the problem "topic or broker not exist" or "resource overhead",  it
is just be polished in passing.

The original issue is https://github.com/apache/rocketmq/issues/3843,
which wants to reduce the unavailable time during broker(with dledger) role
change, reduce the impact on sequential message producers.





Xiaorui Wang <vi...@apache.org> 于2022年3月3日周四 22:57写道:

> Thank you for your prompt reply.
>
> I have read your email carefully and know that what you said is mainly
> about the following two problems.
>
> Problem one: Accesses a topic that does not exist, the path of each access
> will be twice as long.
>
> Problem two: Because of the increasing number of topics, the network
> overhead and memory will be increased by the round-robin training.
>
> For the above, I hope you could provide more quantitative data.
>
> IMO, I have such suggestions for the above problems, which is only for
> reference.
>
> For question one: If push mechanism is added, whether to remove pull
> mechanism, otherwise the problem will still exist.
>
> For question two: Whether the network overhead and memory overhead have a
> significant impact on the application, if there is no modification.
>
> I hope my advice will be helpful to you, rather than disturbing you. Our
> common goal is to fully discuss an architectural change and make it better.
>
> Best regards,
>
> Xiaorui Wang 王小瑞
> Apache RocketMQ PMC Chair
>
>
> xijiu <42...@qq.com.invalid> 于2022年3月3日周四 19:47写道:
>
> > Thanks for your reply~
> >
> >
> >
> > There are two points I will briefly explain:
> >
> >
> >
> > 1.In addition to the problem of not being able to obtain the latest
> > routing data of the topic in time, it also faces the following problems:
> >
> > If the client frequently accesses a topic that does not exist, the path
> of
> > each access will become lengthy and require two network requests
> >
> > send request to nameServer to get topic route data
> >
> > send default topic TBW102 request to nameServer
> >
> > In some applications, the client will access many topics. After these
> > topics are accessed once, they may no longer be accessed or very
> > infrequent, but the client will still pull routing information from the
> > NameServer each time during round-robin training, which increases network
> > overhead. Zombie Topic also occupies more memory
> >
> > Related issues
> >
> > https://github.com/apache/rocketmq/issues/3207
> >
> > https://github.com/apache/rocketmq/issues/3858
> >
> > https://github.com/apache/rocketmq/issues/3870
> >
> >
> >
> > 2.The purpose of this modification is to supplement the current rotation
> > training strategy, and the notification strategy will be adjusted
> according
> > to the busy degree of the machine. Therefore, this complexity will not
> > bring stability pressure
> >
> >
> >
> >
> >
> >
> > 感谢您的回复
> >
> >
> >
> > 有两点我简单阐述一下:
> >
> >
> >
> > 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题
> >
> > Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求
> >
> > send request to nameServer to get topic route data
> >
> > send default topic TBW102 request to nameServer
> >
> > 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从
> > NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存
> >
> > 此问题相关的issues:
> >
> > https://github.com/apache/rocketmq/issues/3207
> >
> > https://github.com/apache/rocketmq/issues/3858
> >
> > https://github.com/apache/rocketmq/issues/3870
> >
> >
> >
> >
> >
> >
> 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力
> >
> >
> >
> >
> >
> >
> >
> > ------------------&nbsp;原始邮件&nbsp;------------------
> > 发件人:
> >                                                   "dev"
> >                                                                 <
> > vintagewang@apache.org&gt;;
> > 发送时间:&nbsp;2022年3月2日(星期三) 晚上9:00
> > 收件人:&nbsp;"dev"<dev@rocketmq.apache.org&gt;;
> >
> > 主题:&nbsp;Re: [VOTE][RIP-36] Optimize topic routing mechanism
> >
> >
> >
> > I read the whole plan, it is beneficial for the nameserver to actively
> push
> > changes to the client, but this benefit also brings complexity. I
> > personally think this benefit is not very big. Unless there is a better
> > explanation, I will reject this proposal.
> >
> > Best regards,
> >
> > Xiaorui Wang 王小瑞
> > Apache RocketMQ PMC Chair
> >
> >
> > xijiu <422766572@qq.com.invalid&gt; 于2022年3月2日周三 19:42写道:
> >
> > &gt; Hi, RocketMQ Community,
> > &gt;
> > &gt; As discussed in the previous email, we launched a new RIP to
> optimize
> > &gt; topic routing mechanism. Now the shepherds @dongeforever and @yukon
> > are
> > &gt; willing to support the RIP, so I think it is time to start an email
> > thread
> > &gt; to enter the voting process.
> > &gt;
> > &gt;
> > &gt; The vote will be open for at least 72 hours or until a necessary
> > number of
> > &gt; votes are reached.
> > &gt;
> > &gt; Please vote accordingly:
> > &gt;
> > &gt; [ ] +1 approve
> > &gt; [ ] +0 no opinion
> > &gt; [ ] -1 disapprove with the reason
> > &gt;
> > &gt;
> > &gt; Best Regards!
> > &gt; xijiu
> > &gt;
> > &gt; links:
> > &gt; https://shimo.im/docs/vVAXVrDNnoSrMBqm/
>

Re: [VOTE][RIP-36] Optimize topic routing mechanism

Posted by Xiaorui Wang <vi...@apache.org>.
Thank you for your prompt reply.

I have read your email carefully and know that what you said is mainly
about the following two problems.

Problem one: Accesses a topic that does not exist, the path of each access
will be twice as long.

Problem two: Because of the increasing number of topics, the network
overhead and memory will be increased by the round-robin training.

For the above, I hope you could provide more quantitative data.

IMO, I have such suggestions for the above problems, which is only for
reference.

For question one: If push mechanism is added, whether to remove pull
mechanism, otherwise the problem will still exist.

For question two: Whether the network overhead and memory overhead have a
significant impact on the application, if there is no modification.

I hope my advice will be helpful to you, rather than disturbing you. Our
common goal is to fully discuss an architectural change and make it better.

Best regards,

Xiaorui Wang 王小瑞
Apache RocketMQ PMC Chair


xijiu <42...@qq.com.invalid> 于2022年3月3日周四 19:47写道:

> Thanks for your reply~
>
>
>
> There are two points I will briefly explain:
>
>
>
> 1.In addition to the problem of not being able to obtain the latest
> routing data of the topic in time, it also faces the following problems:
>
> If the client frequently accesses a topic that does not exist, the path of
> each access will become lengthy and require two network requests
>
> send request to nameServer to get topic route data
>
> send default topic TBW102 request to nameServer
>
> In some applications, the client will access many topics. After these
> topics are accessed once, they may no longer be accessed or very
> infrequent, but the client will still pull routing information from the
> NameServer each time during round-robin training, which increases network
> overhead. Zombie Topic also occupies more memory
>
> Related issues
>
> https://github.com/apache/rocketmq/issues/3207
>
> https://github.com/apache/rocketmq/issues/3858
>
> https://github.com/apache/rocketmq/issues/3870
>
>
>
> 2.The purpose of this modification is to supplement the current rotation
> training strategy, and the notification strategy will be adjusted according
> to the busy degree of the machine. Therefore, this complexity will not
> bring stability pressure
>
>
>
>
>
>
> 感谢您的回复
>
>
>
> 有两点我简单阐述一下:
>
>
>
> 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题
>
> Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求
>
> send request to nameServer to get topic route data
>
> send default topic TBW102 request to nameServer
>
> 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从
> NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存
>
> 此问题相关的issues:
>
> https://github.com/apache/rocketmq/issues/3207
>
> https://github.com/apache/rocketmq/issues/3858
>
> https://github.com/apache/rocketmq/issues/3870
>
>
>
>
>
> 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力
>
>
>
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:
>                                                   "dev"
>                                                                 <
> vintagewang@apache.org&gt;;
> 发送时间:&nbsp;2022年3月2日(星期三) 晚上9:00
> 收件人:&nbsp;"dev"<dev@rocketmq.apache.org&gt;;
>
> 主题:&nbsp;Re: [VOTE][RIP-36] Optimize topic routing mechanism
>
>
>
> I read the whole plan, it is beneficial for the nameserver to actively push
> changes to the client, but this benefit also brings complexity. I
> personally think this benefit is not very big. Unless there is a better
> explanation, I will reject this proposal.
>
> Best regards,
>
> Xiaorui Wang 王小瑞
> Apache RocketMQ PMC Chair
>
>
> xijiu <422766572@qq.com.invalid&gt; 于2022年3月2日周三 19:42写道:
>
> &gt; Hi, RocketMQ Community,
> &gt;
> &gt; As discussed in the previous email, we launched a new RIP to optimize
> &gt; topic routing mechanism. Now the shepherds @dongeforever and @yukon
> are
> &gt; willing to support the RIP, so I think it is time to start an email
> thread
> &gt; to enter the voting process.
> &gt;
> &gt;
> &gt; The vote will be open for at least 72 hours or until a necessary
> number of
> &gt; votes are reached.
> &gt;
> &gt; Please vote accordingly:
> &gt;
> &gt; [ ] +1 approve
> &gt; [ ] +0 no opinion
> &gt; [ ] -1 disapprove with the reason
> &gt;
> &gt;
> &gt; Best Regards!
> &gt; xijiu
> &gt;
> &gt; links:
> &gt; https://shimo.im/docs/vVAXVrDNnoSrMBqm/