You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by GitBox <gi...@apache.org> on 2019/10/15 07:36:32 UTC
[GitHub] [rocketmq] BeiKeJieDeLiuLangMao commented on issue #686: MQFaultStrategy optimize

BeiKeJieDeLiuLangMao commented on issue #686: MQFaultStrategy optimize
URL: https://github.com/apache/rocketmq/pull/686#issuecomment-542081271
 
 
   > > @alaric27
   > > 我分析了一下，不知道对不对。
   > > ```
   > > public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, /*通常情况下，指上次请求失败时用到的节点*/final String lastBrokerName) {
   > >         if (this.sendLatencyFaultEnable) {
   > >             try {
   > >                 // 外层轮询，下次请求时会选择下一个队列
   > >                 int index = tpInfo.getSendWhichQueue().getAndIncrement();
   > >                 for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
   > >                     // index用于内层轮询，从而排除不可用的节点
   > >                     int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
   > >                     if (pos < 0)
   > >                         pos = 0;
   > >                     MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
   > >                     // 验证可用性，latencyFaultTolerance存储了各个Broker发送消息的耗时，已经预估的下次可用时间点
   > >                     if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
   > >                         // 如果是第一次请求，并且可用，则直接返回
   > >                         // 如果是重试请求，则只使用brokerName等于lastBrokerName相同的，这点大家肯定有疑问：为啥要使用上次失败的，其实这里的lastBrokerName不单单指上次发送失败的节点，
   > >                         // 它还能蕴含推荐节点的信息，本函数的后半段中会看到如何将推荐节点的信息传递到lastBrokerName
   > >                         // 但是我觉得这样写的一个弊端是：当前可用节点，如果和上次失败时所用的节点不一致时，就会被排除掉，这也会影响效率
   > >                         if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
   > >                             return mq;
   > >                     }
   > >                 }
   > >                 // 走到这一步，代表所有节点都不可用（首次请求）或者上一次失败时用到的节点仍然不可用（重试请求）
   > >                 // 随后，将所有不可用的节点，按照潜在可用性（当前可用与否>上次使用该节点时的调用耗时>预估的下次可使用时间），进行了排序，然后选择最优的结果
   > >                 // 这里不用担心，连续重试时pickOneAtLeast每次都选择了相同节点，因为其内部也是用了轮训机制，会从最优->次优的顺序给出下一次推荐的节点
   > >                 final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
   > >                 //getQueueIdByBroker这个函数名也是惊到我了，完全和其功能不匹配，本行代码意在得到推荐节点是否仍然存在可写队列，如果存在，得出队列数
   > >                 int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
   > >                 // 我们得到的推荐节点存在可写队列
   > >                 if (writeQueueNums > 0) {
   > >                     // 至此我们已经拿到了一个推荐节点，但是接下来代码的作者并没有简单地根据推荐节点来寻找队列，而是靠外层轮训找了下一个队列
   > >                     // 如果在次重试过程中没有发生路由信息更新的话，该队列应该仍然是不可用的，并且很可能仍是最初失败的Broker节点的队列，
   > >                     // 为什么这么说：假如消息队列为[BrokerA1,BrokerA2,BrokerA3,BrokerA4,BrokerB1,BrokerB2],只有当上一次轮训到BrokerA4时，这里才会跳过BrokerA而得到BrokerB的队列
   > >                     // 而且，可能下次更新路由表时，该信息可能就会成为过期数据而被GC，我猜作者是觉得反正这个数据快没用了，不如把它替换成刚才得到的推荐节点信息，这样可以少new一个对象，还能增加下次查找时的效率
   > >                     // 之所以说它能增加效率，是因为这个过程实质上是将一个很可能宕机的节点队列换成了最可能可用的节点信息，那么下次再轮训到这个节点时，实际上就跳过了寻找推荐节点的过程
   > >                     final MessageQueue mq = tpInfo.selectOneMessageQueue();
   > >                     if (notBestBroker != null) {
   > >                         // 把得到的队列的BrokerName改成我们前面得到的推荐节点，这样如果再请求失败并且重试的时候，lastBrokerName其实存储的就是推荐节点的信息了，下次再执行本函数时就会优先使用推荐节点的其他队列
   > >                         mq.setBrokerName(notBestBroker);
   > >                         // 重新计算其队列编号，因为得到的这个队列数可能和推荐节点的队列数不一致，如果用了错误的队列序号，消息发送到Broker那时，肯定会报错
   > >                         // 因此，这里基于外层轮询使用的index，对本次使用的队列编号进行了计算，我觉得这里最终达到的是随机选择的效果
   > >                         mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
   > >                     }
   > >                     return mq;
   > >                 } else { // 推荐节点不存在可写队列了，说明该节点可能已经宕机，并且NameServer已经删除了其路由信息，并且已经同步过来了
   > >                     // 这时候可以将该节点从延迟统计表中删除，不在考虑该节点
   > >                     latencyFaultTolerance.remove(notBestBroker);
   > >                 }
   > >             } catch (Exception e) {
   > >                 log.error("Error occurred when selecting message queue", e);
   > >             }
   > >             // 如果拿到的推荐节点已经不存在可写队列了，就随机选一个队列
   > >             return tpInfo.selectOneMessageQueue();
   > >         }
   > >         // 默认策略
   > >         return tpInfo.selectOneMessageQueue(lastBrokerName);
   > >     }
   > > ```
   > 
   > 这里：
   > 
   > > ```
   > >                     // 把得到的队列的BrokerName改成我们前面得到的推荐节点，这样如果再请求失败并且重试的时候，lastBrokerName其实存储的就是推荐节点的信息了，下次再执行本函数时就会优先使用推荐节点的其他队列
   > >                     mq.setBrokerName(notBestBroker);
   > > ```
   > 
   > 你的说法我完全不能理解，这里是选中了notBestBroker，如果再请求失败并且重试的时候，lastBrokerName就是这里的notBestBroker，不理解，为啥都失败了，还会优先使用notBestBroker。
   
   如果发送失败了，还会优先使用notBestBroker，因为它很可能仍然是最佳的，但是要清楚这里还有一个前提，就是notBestBroker的 mq 可用时才会使用，如果mq仍然不可用的话，它还会进入后半段，再选下一个notBestBroker。
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services