You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Tung TRAN (Jira)" <se...@james.apache.org> on 2021/06/15 08:47:00 UTC
[jira] [Commented] (JAMES-3599) Improve the design of the RabbitMQ eventbus

    [ https://issues.apache.org/jira/browse/JAMES-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363482#comment-17363482 ] 

Tung TRAN commented on JAMES-3599:
----------------------------------

I just want to confirm.
When `Execute group 1,2,3`
If 
1 -> ok
2 -> ok
3 -> error
We only handle `Group 3 retry exchange` without Group 1,2 retry.
Am I right?

> Improve the design of the RabbitMQ eventbus
> -------------------------------------------
>
>                 Key: JAMES-3599
>                 URL: https://issues.apache.org/jira/browse/JAMES-3599
>             Project: James Server
>          Issue Type: Task
>          Components: mailbox, rabbitmq
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Priority: Major
>             Fix For: 3.7.0
>
>         Attachments: design_after.png, design_before.png, rabbitmq-management.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Mailing list discussion: https://www.mail-archive.com/server-dev@james.apache.org/msg70437.html
> I did spend a bit of time digging within the RabbitMQ performances and
> stability.
> I was surprised to discover weeks ago the amount of work performed by
> play.json library and could not just quite explain why it was hogging 3%
> of CPU time, and be the most CPU consumer for mailbox events. RabbitMQ
> acks account for another 1.20% of CPU time.
> Investigating in the RabbitMQ eventbus I realized the events are routed
> to all group queues, dispatched and deserialized then applied if relevant.
> Given 200 events/s and given that the JMAP server has 10 groups we end
> up deserializing 2000 events/s, even if irrelevant for the groups.
> As I recall, we wanted the the event per group to be the unit of retry.
> Noble design goal.
> I think parallelizing groups is a non goal: this kind of optimization
> would not improve response time as it is asynchronous, running in the
> background, and makes little sense at 1000s requests per seconds.
> However ending up having one queue per event is likely sub-optimal. I
> think the design can be improved by, in the nominal case, transmitting
> only one message to all groups. The receiving groups will then try to
> execute all groups. We can keep reties for individual groups (with their
> dedicated exchanges and queues): upon failure, we republish to the retry
> exchange of the incriminated listener. This makes the upgrade path easy
> too, as the group queue keeps being consumed. One would just need to do
> some unbindings...
> Note that such an evolution would:
>  - also enable us, if we want, to enforce some execution orders for
> listeners, opening the way to fix things like JAMES-3561
> <https://issues.apache.org/jira/browse/JAMES-3561> ...
>  - it could serve as an inspiration for future eventBus implementations
> like the Pulsar one, hence getting feedback on the existing design is
> IMO useful.
> I will create a JIRA ticket holding the design proposal (schema) and how
> it does defer from the previous one, as well as some RabbitMQ management
> screenshots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org