You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Domenico Francesco Bruscino (Jira)" <ji...@apache.org> on 2020/08/25 13:17:01 UTC

[jira] [Closed] (ARTEMIS-2877) Fix journal replication scalability

     [ https://issues.apache.org/jira/browse/ARTEMIS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Domenico Francesco Bruscino closed ARTEMIS-2877.
------------------------------------------------
    Resolution: Done

> Fix journal replication scalability 
> ------------------------------------
>
>                 Key: ARTEMIS-2877
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2877
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.7.0, 2.8.1, 2.9.0, 2.10.0, 2.10.1, 2.11.0, 2.12.0, 2.13.0, 2.14.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>             Fix For: 2.15.0
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Journal scalability with a replicated pair has degraded due to:
> * a semantic change on journal sync that was causing the Netty event loop on the backup to await any journal operation to hit the disk - see https://issues.apache.org/jira/browse/ARTEMIS-2837
> * a semantic change on NettyConnection::write from within the Netty event loop, that is now immediately writing and flushing buffers, while it was delaying it by offering it again in the event loop  -  see: https://issues.apache.org/jira/browse/ARTEMIS-2205 (in particular https://github.com/apache/activemq-artemis/commit/a40a459f8c536a10a0dccae6e522ec38f09dd544#diff-3477fe0d8138d589ef33feeea2ecd28eL377-L392)
> The former issues has been solved by reverting the changes and reimplemented without introducing any semantic change.
> The latter need some more explanation to be understood:
> # ReplicationEndpoint is responsible to handle packets from live
> # Netty provide incoming packets to ReplicationEndpoint in batches
> # after each processed packet coming from live (that would likely end to append something to the journal), a replication packet response need to be sent back from backup to the live: in the original behavior (< 2.7.0) the responses were delayed to be flushed to the connection until the end of a processed batch of packets, causing the journal to append records in bursts and amortizing the full cost of awaking the I/O thread responsible of appending data to the journal. 
> To emulate the original "bursty" behavior. but making the batching more explicit (and tunable too), it can be solved:
> # using Netty's ChannelInboundHandler::channelReadComplete event to flush each batch of packet responses as before
> # [OPTIONAL] implement a new append executor on the journal to further reduce the cost of awaking the appending thread, reducing the appending cost



--
This message was sent by Atlassian Jira
(v8.3.4#803005)