You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Francesco Nigro (Jira)" <ji...@apache.org> on 2020/08/12 21:22:00 UTC

[jira] [Commented] (ARTEMIS-2877) Improve journal replication scalability

    [ https://issues.apache.org/jira/browse/ARTEMIS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176611#comment-17176611 ] 

Francesco Nigro commented on ARTEMIS-2877:
------------------------------------------

I've created a branch https://github.com/franz1981/activemq-artemis/tree/speed_up_core_mmap that collect the many changes mentioned in this JIRA and that is solving the scalability issue, according to my tests.

> Improve journal replication scalability 
> ----------------------------------------
>
>                 Key: ARTEMIS-2877
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2877
>             Project: ActiveMQ Artemis
>          Issue Type: Task
>          Components: Broker
>    Affects Versions: 2.7.0, 2.8.1, 2.9.0, 2.10.0, 2.10.1, 2.11.0, 2.12.0, 2.13.0, 2.14.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>             Fix For: 2.15.0
>
>
> Journal scalability with a replicated pair has degraded due to:
> * a semantic change on journal sync that was causing the Netty event loop on the backup to await any journal operation to hit the disk - see https://issues.apache.org/jira/browse/ARTEMIS-2837
> * a semantic change on NettyConnection::write from within the Netty event loop, that is now immediately writing and flushing buffers, while it was delaying it by offering it again in the event loop  -  see: https://issues.apache.org/jira/browse/ARTEMIS-2205 (in particular https://github.com/apache/activemq-artemis/commit/a40a459f8c536a10a0dccae6e522ec38f09dd544#diff-3477fe0d8138d589ef33feeea2ecd28eL377-L392)
> The former issues has been solved by reverting the changes and reimplementing the new semantic by using a flag to switch between the twos.
> The latter need some more explanation to be understood:
> # ReplicationEndpoint is responsible to handle packets from live
> # Netty provide incoming packets to ReplicationEndpoint in batches
> # after each processed packet coming from live (that would likely end to append something to the journal), a replication packet response need to be sent back from backup to the live: in the original behavior (< 2.7.0) the responses were delayed until the end of a processed batch of packets, thus making the journal to append records in bursts, amortizing the full cost of awaking the I/O thread responsible of appending data to the journal. 
> To emulate the original "bursty" behavior. but making the batching more explicit (and tunable too), it can be solved:
> # using Netty's ChannelInboundHandler::channelReadComplete event to flush each batch of packet responses as before
> # [OPTIONAL] implement a new append executor on the journal to further reduce the cost of awaking the appending thread, reducing the appending cost



--
This message was sent by Atlassian Jira
(v8.3.4#803005)