You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@rocketmq.apache.org by "Yu Kaiyuan (JIRA)" <ji...@apache.org> on 2017/08/23 07:55:00 UTC

[jira] [Updated] (ROCKETMQ-272) The config `syncFlushTimeout` doesn't work for SYNC_MASTER

     [ https://issues.apache.org/jira/browse/ROCKETMQ-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yu Kaiyuan updated ROCKETMQ-272:
--------------------------------
    Description: 
It's quite frequent to get result as `sendStatus=FLUSH_SLAVE_TIMEOUT` when sending big messages(>500k) in SYNC_MASTER/SLAVE scenario.
The timeout value used by the sync process currently as I found, is the config `syncFlushTimeout`. And its default value is 5000 milliseconds.
But it shows that producer get the result as `FLUSH_SLAVE_TIMEOUT` less than 1 second. 
So why does the config not work as expected?

Relevant code:

{code:java}
// CommitLog.java
public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
    if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
        HAService service = this.defaultMessageStore.getHaService();
        if (messageExt.isWaitStoreMsgOK()) {
            // Determine whether to wait
            if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
                GroupCommitRequest  request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                service.putRequest(request);
                service.getWaitNotifyObject().wakeupAll();
                boolean flushOK =
                    request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    log.error("do sync transfer other node, wait return, but failed, topic: " + messageExt.getTopic() + " tags: "
                        + messageExt.getTags() + " client address: " + messageExt.getBornHostNameString());
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
                }
            }
            // Slave problem
            else {
                // Tell the producer, slave not available
                putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
            }
        }
    }
}
{code}


  was:
It's quite frequent to get result as `sendStatus=FLUSH_SLAVE_TIMEOUT` when sending big messages(>500k) in SYNC_MASTER/SLAVE scenario.
The timeout value used by the sync process currently as I found, is the config `syncFlushTimeout`. And its default value is 5000 milliseconds.
But it shows that producer get the result as `FLUSH_SLAVE_TIMEOUT` less than 1 second. 
So why does the config not work as expected?

Relevant code:

{code:java}
// CommitLog.java
public PutMessageResult putMessage(final MessageExtBrokerInner msg) {
{
    // Synchronous write double
    if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
        HAService service = this.defaultMessageStore.getHaService();
        if (msg.isWaitStoreMsgOK()) {
            // Determine whether to wait
            if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
                if (null == request) {
                    request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                }
                service.putRequest(request);

                service.getWaitNotifyObject().wakeupAll();

                boolean flushOK =
                    // TODO
                    request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    log.error("do sync transfer other node, wait return, but failed, topic: " + msg.getTopic() + " tags: "
                        + msg.getTags() + " client address: " + msg.getBornHostString());
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
                }
            }
            // Slave problem
            else {
                // Tell the producer, slave not available
                putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
            }
        }
    }

    return putMessageResult;
}
{code}



> The config `syncFlushTimeout` doesn't work for SYNC_MASTER
> ----------------------------------------------------------
>
>                 Key: ROCKETMQ-272
>                 URL: https://issues.apache.org/jira/browse/ROCKETMQ-272
>             Project: Apache RocketMQ
>          Issue Type: Bug
>          Components: rocketmq-broker
>    Affects Versions: 4.1.0-incubating
>            Reporter: Yu Kaiyuan
>            Assignee: yukon
>
> It's quite frequent to get result as `sendStatus=FLUSH_SLAVE_TIMEOUT` when sending big messages(>500k) in SYNC_MASTER/SLAVE scenario.
> The timeout value used by the sync process currently as I found, is the config `syncFlushTimeout`. And its default value is 5000 milliseconds.
> But it shows that producer get the result as `FLUSH_SLAVE_TIMEOUT` less than 1 second. 
> So why does the config not work as expected?
> Relevant code:
> {code:java}
> // CommitLog.java
> public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
>     if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
>         HAService service = this.defaultMessageStore.getHaService();
>         if (messageExt.isWaitStoreMsgOK()) {
>             // Determine whether to wait
>             if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
>                 GroupCommitRequest  request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
>                 service.putRequest(request);
>                 service.getWaitNotifyObject().wakeupAll();
>                 boolean flushOK =
>                     request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
>                 if (!flushOK) {
>                     log.error("do sync transfer other node, wait return, but failed, topic: " + messageExt.getTopic() + " tags: "
>                         + messageExt.getTags() + " client address: " + messageExt.getBornHostNameString());
>                     putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
>                 }
>             }
>             // Slave problem
>             else {
>                 // Tell the producer, slave not available
>                 putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
>             }
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)