You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@rocketmq.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/20 14:22:04 UTC

[jira] [Commented] (ROCKETMQ-184) It takes too long(3-33 seconds) to switch to read from slave when master crashes

    [ https://issues.apache.org/jira/browse/ROCKETMQ-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976773#comment-15976773 ] 

ASF GitHub Bot commented on ROCKETMQ-184:
-----------------------------------------

GitHub user Jaskey opened a pull request:

    https://github.com/apache/incubator-rocketmq/pull/95

    [ROCKETMQ-184]-It takes too long(3-33 seconds) to switch to read from slave when master crashes

    JIRA:https://issues.apache.org/jira/browse/ROCKETMQ-184?jql=project%20%3D%20ROCKETMQ
    
    Problem, no listener is triggered when Chanel is close. 
    
    When async command sent to the server, and the server is crash before sending response to client, the callback can not be invoked in time. Instead, the callback can only be triggered by the timeout scan service. 
    
    This is obvious for pulling message since the timeout is by default 30 seconds. So if master crashes before process response to the client, the client can not repull until scan service tell it, which takes at most 30 seconds. And repull will have 3 seconds delay, so the HA to read from slave has to take 3-33 seconds when this problem occurs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Jaskey/incubator-rocketmq ROCKETMQ-184-slave-switch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-rocketmq/pull/95.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #95
    
----
commit 0dc37e1c10450f143e58faa857f5bf8ccbee1ca3
Author: Jaskey <li...@gmail.com>
Date:   2017-04-19T09:13:09Z

    invoke callback when channel is close after  sending async command

----


> It takes too long(3-33 seconds) to switch to read from slave when master crashes
> --------------------------------------------------------------------------------
>
>                 Key: ROCKETMQ-184
>                 URL: https://issues.apache.org/jira/browse/ROCKETMQ-184
>             Project: Apache RocketMQ
>          Issue Type: Improvement
>          Components: rocketmq-client, rocketmq-remoting
>            Reporter: Jaskey Lam
>            Assignee: Xiaorui Wang
>             Fix For: 4.2.0-incubating
>
>
> When master crashes, no notifier callback is triggered to pull message again.
> Instead, it relies on the scan service to trigger timeout and then re pull.
> But the pulling command has 30 seconds timeout, and after timeout, pulling operation will be scheduled after 3 seconds.
> So it takes 3 to 33 seconds to switch to slave, which is too long and can be optimized.
> The root cause is the below repull cost too long to be triggered when master crashes
> {code}
>             @Override
>             public void onException(Throwable e) {
>                 if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
>                     log.warn("execute the pull request exception", e);
>                 }
>                 DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, PULL_TIME_DELAY_MILLS_WHEN_EXCEPTION);
>             }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)