You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/12/27 02:41:00 UTC
[jira] [Commented] (ARTEMIS-1570) SharedNothingBackup does not replicate all journal from live

    [ https://issues.apache.org/jira/browse/ARTEMIS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304151#comment-16304151 ] 

ASF GitHub Bot commented on ARTEMIS-1570:
-----------------------------------------

GitHub user shoukunhuai opened a pull request:

    https://github.com/apache/activemq-artemis/pull/1742

    ARTEMIS-1570 Flush appendExecutor before take journal snapshot

    When live start replication, it must make sure there is
    no pending write in message & bindings journal, or we may
    lost journal records during initial replication.
    
    So we need flush append executor after acquire StorageManager's
    write lock, before Journal's write lock.
    Also we set a 10 seconds timeout when flush, the same as
    Journal::flushExecutor. If we failed to flush in 10 seconds,
    we abort replication, backup will try again later.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shoukunhuai/activemq-artemis flush-journal-executor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/activemq-artemis/pull/1742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1742
    
----
commit 3d4c45925f2fe274579b262cedd63103ef29cb4e
Author: shoukun <sh...@...>
Date:   2017-12-27T02:23:33Z

    Flush appendExecutor before take journal snapshot
    
    When live start replication, it must make sure there is
    no pending write in message & bindings journal, or we may
    lost journal records during initial replication.
    
    So we need flush append executor after acquire StorageManager's
    write lock, before Journal's write lock.
    Also we set a 10 seconds timeout when flush, the same as
    Journal::flushExecutor. If we failed to flush in 10 seconds,
    we abort replication, backup will try again later.

----


> SharedNothingBackup does not replicate all journal from live
> ------------------------------------------------------------
>
>                 Key: ARTEMIS-1570
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1570
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.4.0
>         Environment: i'm running unit test on windows.
>            Reporter: shoukun huai
>            Priority: Critical
>         Attachments: SharedNothingReplicationTest.java
>
>
> I try to test replication when live is in heavy IO load.
> Attached is my junit test.
> The test use a slow message persister to simulate live is busy on IO, so JournalImpl's `appendExecutor` is busy.
> After start live server, send 5 messages each with a property `delay` of 5000 ms, then start the backup server, wait until it is replicated. Then send more messages without delay.
> Stop live and backup after all message sent, then check message journal.
> Backup will miss 2 message/journal entry.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)