You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Francesco Nigro (Jira)" <ji...@apache.org> on 2020/08/17 08:30:00 UTC
[jira] [Commented] (ARTEMIS-2877) Fix journal replication scalability

    [ https://issues.apache.org/jira/browse/ARTEMIS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178837#comment-17178837 ] 

Francesco Nigro commented on ARTEMIS-2877:
------------------------------------------

I have opened a PR with the most recent changes without implementing the new appender executor, that will be added in a separated pr.

> Fix journal replication scalability 
> ------------------------------------
>
>                 Key: ARTEMIS-2877
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2877
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.7.0, 2.8.1, 2.9.0, 2.10.0, 2.10.1, 2.11.0, 2.12.0, 2.13.0, 2.14.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>             Fix For: 2.15.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Journal scalability with a replicated pair has degraded due to:
> * a semantic change on journal sync that was causing the Netty event loop on the backup to await any journal operation to hit the disk - see https://issues.apache.org/jira/browse/ARTEMIS-2837
> * a semantic change on NettyConnection::write from within the Netty event loop, that is now immediately writing and flushing buffers, while it was delaying it by offering it again in the event loop  -  see: https://issues.apache.org/jira/browse/ARTEMIS-2205 (in particular https://github.com/apache/activemq-artemis/commit/a40a459f8c536a10a0dccae6e522ec38f09dd544#diff-3477fe0d8138d589ef33feeea2ecd28eL377-L392)
> The former issues has been solved by reverting the changes and reimplemented without introducing any semantic change.
> The latter need some more explanation to be understood:
> # ReplicationEndpoint is responsible to handle packets from live
> # Netty provide incoming packets to ReplicationEndpoint in batches
> # after each processed packet coming from live (that would likely end to append something to the journal), a replication packet response need to be sent back from backup to the live: in the original behavior (< 2.7.0) the responses were delayed to be flushed to the connection until the end of a processed batch of packets, causing the journal to append records in bursts and amortizing the full cost of awaking the I/O thread responsible of appending data to the journal. 
> To emulate the original "bursty" behavior. but making the batching more explicit (and tunable too), it can be solved:
> # using Netty's ChannelInboundHandler::channelReadComplete event to flush each batch of packet responses as before
> # [OPTIONAL] implement a new append executor on the journal to further reduce the cost of awaking the appending thread, reducing the appending cost



--
This message was sent by Atlassian Jira
(v8.3.4#803005)